When you combine data from many different sources or times in order to lower the possibility of a single individual being identified.


When a machine, software, or function extends a person’s abilities or potential while maintaining their agency.


When a machine, software, or function performs a task without user involvement.

Binary Classification

Binary classification: when an ML model predicts if an example falls into one category or another based on a set of features.


When a machine learning model identifies an object. In response to an identification question, the simplest classification is “yes” or “no”. For example, if a model was shown a picture of a cat, it could classify it as “Cat”, or “Not a cat”. More complex classifications are sorting items into one of several groups.

Confidence Level, Model Confidence

The confidence level for a model is a statistical measure of how certain a prediction or outcome is.

Context Errors

Situations when the product output doesn’t make sense in the user’s current context. Often, this output is perceived as irrelevant by the user.


Rationale for why something is classified as not within the given class. Usually in the form of a statement of how the world would have to be different for a desirable outcome to occur.

Data Cascades

Compounding events that cause negative, downstream effects from data issues, and result in technical debt over time.

Data Collection and Labeling

How product teams get the data they need and apply meaningful labels to it. For example: acquiring millions of images of cats and dogs correctly labeled as “cat” or “dog”.

Data Distribution

Shows frequency of specific values within a dataset. For example, your could find that your data includes a high number of certain values, and lower numbers of others. Usually follows “normal” distribution, or a Gaussian curve.

Data Examples

Lines in a dataset or specific pieces of data, such as a photo of a shoe or run route.

Data Features

An individual measurable property or characteristic of an observable entity. Feature should be informative, discriminating, and independent.

Data Labels

Human-added descriptions for a piece of data, or example.

Explicit Data Collection

When you request information from users outright, like in feedback forms.

Explicit Feedback

Information solicited from users from within your app. For example: rating systems, review requests, forms, or surveys.

False Negatives

When the ML algorithm classifies an object as not in a certain category, when it actually is. For example, if it was searching for sneakers, and it didn’t return several true images of sneakers.

False Positives

When the machine learning algorithm classifies an object as belonging to a certain category, but it is not in that category. For example, if the algorithm incorrectly identified a sneaker as a llama.


Distinct data sources or machine learning calculations that influence a prediction or outcome.

Folk Theories

Invented (and usually false) ideas of how a product works based on existing mental models and assumptions.

General System Explanations

Descriptions of general system functionality, i.e. how and why it uses inputs to generate outputs.


Based on static if-then functions, or rules based on desired situation-result pairs. If a certain situation arises, the software produces a specific result, every time.

Implicit Data Collection

When you gather information about users passively, usually through logging behavior.

Implicit Feedback

Information about user behaviors, preferences, and needs that’s gathered from their interactions within your application or product. Often uses logging — records of what people do within your app.

Inter-labeler Reliability

A measure of consensus between different labelers performing the same task. Also known as inter-labeler agreement, or concordance.


A person who labels the data used to train machine learning algorithms, specifically supervised learning models.

Synonym for rater.


A label is the description that is either given to a piece of data by a human or derived from user actions. For example, labeling a photo as “sneakers”, or run route as “hilly”.

ML Model

Mathematical algorithm that learns the statistical relationships among examples to make predictions in the future.

Machine Learning

Techniques and methods to program computers to execute tasks without super-specific rules. ML can help machines recognize patterns and adjust to unique situations.

Machine Learning (ML) Systems

Techniques and methods to develop AI, by getting computers to do something without being programmed with super-specific rules. ML can help machines recognize patterns and adjust to unique situations.

Mental Model

Users’ internal explanations of how something works. They shape how users interact with a product or feature and it’s perceived value.

N-Best, N-Best Classifications, N-Best Lists

Refers to showing a certain number, “n”, top solutions or suggestions, such as the top 5 matches for an image search.

Network Effect

When a person starts or stops using a product or service because the majority of their network is using it or not.


When a model is optimized for predictive power for a training dataset that is narrower than the ML model’s intended use.

Partial Explanations

Messages that explain one aspect of how the system works. Ideally, this is the most important aspect to the user.


The proportion of true positives correctly categorized out of all the true and false positives.

Predictive Power

A percentage that refers to an ML models’ ability to correctly predict outcomes given a certain input. A model with predictive power of 100 gives the correct prediction every time, 0 is purely random.


Situations where there are multiple possible outcomes, each having varying degrees of certainty of its occurrence.

Progressive Disclosures

A practice in UX when more information is revealed in subsequent screens or interactions.

Qualitative Feedback

Non-numeric feedback about how a user feels about a certain experience. Can include measures of satisfaction, happiness, verbal responses or other qualities.

Quantitative Feedback

Feedback that is numeric or converted to a number. Both implicit and explicit feedback mechanisms can be quantitative. This feedback can be fed back into your model for tuning.


Synonym for labeler.


The proportion of true positives correctly categorized out of all the true positives and false negatives.


When some pieces of a dataset or profile are removed to lower the possibility of identifying a single user based on their data profile. You can redact certain features of data to shrink the data profile, or redact examples for a certain amount of time.


Also known as. linear regression algorithms, which try to find the best-fit line for a plot of data points on a graph. As new data points appear over time, the algorithm adjusts the line to fit.

Reward Function

Mathematical equation that your ML algorithm uses to optimize outputs. The function weighs some results as better than others, and optimizes for certain outcomes.

Second-order Effects

When the aggregate or outcomes or behaviors over time produces additional, unexpected outcomes.

Specific Output Explanations

Descriptions of how a system arrives at a specific output based on a certain input.

Supervised Learning

When you “teach” your algorithm on training data. Often this is based on examples manually labeled by humans to show “right” and “wrong” answers.

Test Data

Datasets that you use to test your ML model to make sure its predictions work on data it hasn’t encountered before.

Training Data

Datasets that you use to teach your ML model which outcomes correspond to which inputs.


Providing information about how a product works, including data sources, terms and conditions, privacy, permissions, and rationale behind system output.

True Negatives

When the machine learning algorithm classifies an object as NOT in a certain category and it is indeed not in that specific category. For example, it correctly classifies a llama as “not a sneaker”.

True Positives

When the machine learning algorithm classifies an object in a certain category, and the object is in that category.


When developers adjust their machine learning algorithm based on feedback or errors to improve accuracy and performance.


When a model has a low predictive power across a more varied dataset.