Lexer's Identity Resolution
What is Identity Resolution?
Identity Resolution sets the foundation for a single-customer view. Identity resolution works by linking customer touchpoints to their customer profile, these are aptly named “links”. By mapping different datapoints into one customer profile, segmentation and insights become possible across the entire MarTech stack.
“Identity Resolution” is a data workflow, whereas the “Single-Customer View” is the output in Lexer’s Understand tool and allows you to activate to different marketing channels.
Problem and pain points
Let’s start with the problem: fragmented data that arises from multiple systems supporting customer experience.
For example, imagine a scenario where a customer was receiving messages from two different marketing campaigns. On Monday the customer receives an email promoting outdoor jackets for outdoor activity, on Wednesday they get a notification for streetwear ensembles. These messages could conflict and prevent conversion: how could this happen?!
A common source of this discrepancy is that the same customer profile actually lives in two segments. Data from two systems, such as ecommerce and email, form the basis of a customer profile, but without resolution these profiles aren’t unified.
This impacts customer experience and also limits progress in the marketing program because it erodes trust. With a low trust in data, it can become difficult to commit to plans, the cycle of learning from campaigns slows down and the whole marketing program can stall.
How it works
Identity Resolution framework
The workflow to Identity Resolution has five steps:
- Collect: Lexer connects to customer-facing systems, via integration or solution design, and acquires data into our secure environment.
- Clean: The data is then processed so that each record meets data type specifications — i.e. email and mobile formats.
- Links: Relationships are discovered between profiles and touchpoints.
- Resolve: A profile of the customer is now formed from linked records.
- Deploy: A single-view of customer is made available for personalization through attribute calculation, segmentation, and activation.
Concepts
Here’s a set of terms you’ll see a lot when exploring this topic. This glossary pulls together the Lexer definition of these terms.
Identity Resolution
Identity Resolution is the process of linking records of a person to form a “Single-Customer View” (SCV). An accurate SCV is crucial for powerful segmentation and personalized messaging. Broadly there are two approaches to unification: deterministic and probabilistic.
The deterministic approach trusts the identifiers (such as email, and customer_id) to represent a person and link the records.
The probabilistic approach is more complex but can overcome poor data quality to find links between records.
Identity Resolution is otherwise known as unification, deduplication and stitching. It’s also a more specialized implementation of Record Linkage in that it focusses on people instead of other entities.
Graph
A graph is a useful data structure for representing relationships between records. In this case, we are using a graph to represent the relationships between customers. If you’ve ever played Six Degrees of Kevin Bacon, you’ll understand the concept of a graph. Just replace “movies” starring Kevin Bacon with “customers” buying t-shirts and opening emails and you’ve got a Single-Customer View.
Touch-point
An interaction with a customer such as an in-store purchase, email click or a website login. These are otherwise known as interactions, events, or observations.
Identifier
A datapoint in a profile that directly (deterministically) or indirectly (probabilistically) can be found in another profile or touch-point. A high-confidence identifier will uniquely link to only one customer profiles.
Good examples of identifiers are:
- Customer ID assigned by a Point of Sale system or CRM
- System ID assigned by a commerce or marketing application such as Shopify or Klaviyo
- Email Address provided by the customer at the time of signup, although shared email addresses are common
Examples of identifiers that have linkage with less confidence include:
- A mobile phone number
- A web browser or cookie
- A household address
Link
A link represents a relationship between one or more profiles (entities) or touchpoints (events). A deterministic link is a shared datapoint such as a customer’s email address or customer ID.
Deterministic Identity Resolution
The simplest kind of identity resolution, called deterministic or rules-based record linkage, generates links based on the number of individual identifiers that match among the available data sets. Two records are said to match via a deterministic record linkage procedure if all or some identifiers (above a certain threshold) are identical. Deterministic record linkage is a good option when the entities in the data sets are identified by a common identifier, or when there are several representative identifiers (e.g. name, date of birth, and sex, when identifying a person) whose quality of data is relatively high.
Probabilistic Identity Resolution
Probabilistic record linkage, sometimes called fuzzy matching takes a different approach to the record linkage problem by taking into account a wider range of potential identifiers. For each pair of identifiers between records a model estimates a match or a non-match, and using these weights to calculate the probability that two given records refer to the same entity. Record pairs with probabilities above a certain threshold are considered to be matches, while pairs with probabilities below another threshold are considered to be non-matches. Pairs that fall between these two thresholds are considered to be "possible matches" and can be dealt with accordingly (e.g. human reviewed, linked, or not linked, depending on the requirements). Probabilistic identity resolution is also called probabilistic merging or fuzzy merging.
Defining Success
Success in Identity Resolution means finding the best links between profiles and interactions for a customer profile.
To reach this objective and get linkage just right, we need to solve for two cases — finding too many links (over unification) or too few links (under unification).
Over unification
Over unification is where profiles are merged that should actually be distinct. At the worse case the identity graph can collapse and see many identities against a single profile. Factors leading to over unification include low data quality or spurious input such as false addresses like test@test.com.
To protect against over unification Lexer implements link occurrence thresholds and term frequency analysis. For data that may not have sufficient linkage quality Lexer also supports appending attributes to profiles after resolution has occurred.
Under unification
Under unification occurs when valid links do not match due to a mismatch in formatting. Lexer deploys rich data type specifications to clean and normalize data to maximize the linkage opportunity.
Identity Resolution recap
Lexer's Identity Resolution sets the foundation for a single-customer view and works by linking customer touchpoints to their customer profile. By mapping different datapoints into one customer profile, segmentation and insights become possible across the entire MarTech stack.