Data onboarding process

Lexer offers a wide range of tools to get data into the CDP, from our amazing Shopify integration, through to highly customizable apis for Bulk Uploads. Understanding the data ingestion process is really important for anyone uploading data to the CDP, and it’s even more important when handling more complex use cases such as our Bulk Write API. In this article we’ll run through all you need to know to understand this process which will make our Datasets, Logs and Healthchecks features an absolute powerhouse to monitor your data. 

For more information about these tools check out the links below: 

Ingestion overview

First things first, you need to understand the jobs that consume your data into the Hub. These jobs are independent and happen in a specific order which makes understanding the process really helpful if you need to troubleshoot. The process shown is staggered. 

Let’s take a look at how data is consumed into the Hub. 

The first time this process runs Lexi’s are working away behind the scenes. Once this is done, this data source can be run automatically without any intervention.
  1. Integration or File Upload sends data: Data is provided via integration or upload.
  2. Dataset load: A dataset loads that file and begins to organize the data.
  3. Visible in datasets: The dataset is then visible in Datasets (but the data hasn’t finished its journey yet).
  4. Dataflow: Your data then goes through multiple processes that organize and enrich your data to get it CDP ready.
  5. Build: The final stages build your data into the functional interface that we call the Hub letting you interrogate your data. 

Your data is now available to use in the Hub.

Let’s dive a little deeper

1. Sending your data

There are all sorts of different types of data that can be consumed by the CDP. These different types serve different purposes and are analyzed in different ways. They fall into 3 main categories: Customer, Commerce and Marketing data, however you can also talk to your Success Manager about Custom data too. Within each type, there are record types that further distinguish the data being consumed. 

The main categories we use to differentiate this data are shown below: 

This data can be sent in a number of different ways, from simply integrating with an established integration (you can find the list of our integrations here), all the way through to our Bulk write API that allows you to send through JSON files containing your data. To figure out the best option for you it's always best to discuss your ingestion strategy with your Success Manager as different use cases need different strategies. There are some suggestions for troubleshooting at the end of this page. 

Once complete, you should see a job reflecting this in Logs. A File upload or Integration job depending on the method you have chosen. 

2. Dataset load

At this stage we make sure the data is valid, and conforms to our data specification requirements. Our integrations make this super simple as we have a predefined structure in place, you integrate your accounts and let the CDP take care of the data transfer. 

For the more bespoke solutions where you are sending in files for processing, these get sent to Lexer’s secure storage. From here, if they are in the correct format, they are then able to be consumed into a dataset in the CDP. 

If the data is clean, valid and matches the required format, you should see a Dataset Load job reflecting this in Logs. 

If you notice a problem, reach out to Support as soon as possible to try to identify the issue. 

3. Visible in datasets

At this stage your data can be viewed in Datasets. In the scenario where the Dataset Load fails, the dataset will be empty and you will need to go back to step two, or contact Support for help. If anything goes wrong with the load you will notice the dataset will not have received your data. At this point you will need to check what went wrong in Logs or reach out to your Success Manager for assistance identifying the problem. 

Once your data is in Datasets, you have successfully uploaded your data. This doesn’t mean your data is in the CDP yet, but it does mean we have received what you have sent, which is a fantastic starting point. From here your data will be processed further. 

At this stage Health checks will begin to run on your dataset. Health checks can take up to 24 hours to run. 

For more information about Datasets, check out our Dataset management article

4. Dataflow

The Dataflow stage is where enrichment occurs bringing together all of the disparate data that you’ve sent from a multitude of different sources. The data is siphoned into the different channels that allow the CDP to accurately organize and enrich it for use in the Hub. At this stage the data begins to be transformed. Once all of the data has been effectively distributed, this data can pass through to the final stages of the ingestion process. 

Once complete, you should see a Dataflow job reflecting this in Logs. 

5. Build - The final job

The final job is where the magic happens. The CDP collates all of your data into your Hub allowing you to interrogate and manipulate this into actionable insights. Part of this job involves unification, where all the different bits of data for each customer is brought together into a single unified profile. 

You can find more information about unification here

The build occurs on a schedule consolidating and finalizing all of the adjustments made throughout the entire ingestion process. Once these steps are complete, you’ll see a final job take place in logs, the “Build index”. Once this is visible in Logs, your data is officially in the Hub and ready to use!

Troubleshooting 

Troubleshooting uploads

If you run into any difficulties during data upload, Lexer may be unable to access your data, which can increase the importance of troubleshooting.

If this happens it is best to be proactive. 

  1. Check out the relevant Learn content to help you troubleshoot the ingestion process you are carrying out. 

Some Learn materials that might help: 

  1. If this doesn’t rectify the problem, or you have a bespoke use case that might require custom work, reach out to your Success Manager or Support as soon as possible to try to identify the issue. 

To recap

When your data is ingested, it goes through a 5 stage process before you can use it in the hub. The stages of the process are: 

  1. Integration or File Upload: You integrate or upload a file.
  2. Dataset Load: The uploaded data is loaded into a dataset, beginning data organization.
  3. Visible in Datasets: The dataset appears in Datasets, signaling the initial stage of data processing.
  4. Dataflow: Your data goes through multiple processes to organize and enrich it for Hub readiness.
  5. Build: The final stages involve building your data into the functional Hub interface, enabling interrogation.

Your data is now available in the Hub and ready to use.

That’s a wrap!

In this article, we have covered the process all data goes through when it is ingested into the Hub. We looked through how this data is reflected in the Hub as it progresses through ingestion. We’ve linked out to a bunch of Learn articles to help guide you through the ingestion process. If you are still having trouble, please reach out to Support using the link in the bottom right of the page. 

Updated:
April 26, 2025
Did this page help you?
Thank you! Your feedback has been received!
Oops! Something went wrong while submitting the form, for assistance please contact support@lexer.io
Welcome to Lexer!
Fundamentals
Getting started
Javascript Tag basics
Data Platform
Javascript Tag
Data in the CDXP
Fundamentals
Getting started
Our glossary
Fundamentals
Getting started
Integrations overview
Fundamentals
Integrations
Onboarding data with Lexer
Data Platform
Data Onboarding
Data onboarding process
Data Platform
Data Onboarding
Importing CSV data
Data Platform
Data Onboarding
Importing JSON data
Data Platform
Data Onboarding
Secure file uploads
Data Platform
Data Onboarding
SFTP uploads and exports
Data Platform
Data Onboarding
AWS S3 uploads and exports
Data Platform
Data Onboarding
Lexer's attributes
Data Platform
Data Types
Predictive attributes
Data Platform
Data Types
Lexer API overview
Data Platform
Developer APIs
Understanding APIs
Data Platform
Developer APIs
API authentication
Data Platform
Developer APIs
API rate limits
Data Platform
Developer APIs
Bulk write API
Data Platform
Developer APIs
Profile read API
Data Platform
Developer APIs
Activity overview
Insights
Activity
Team report
Insights
Activity
Cases report
Insights
Activity
NPS report
Insights
Activity
Listen overview
Insights
Listen
Searching in Listen
Insights
Listen
Tier filters
Insights
Listen
Saved dives
Insights
Listen
Boolean search
Insights
Listen
Listen CSV exports
Insights
Listen
Visualize overview
Insights
Visualize
Curate image feed
Insights
Visualize
Respond overview
Engagement
Respond
Identity Resolution
Fundamentals
Getting started
My account
Fundamentals
Setup
Manage team
Fundamentals
Setup
Group permissions
Fundamentals
Setup
Classifications
Fundamentals
Setup
Out of the box segments
Fundamentals
Setup
Lexi your AI companion
Fundamentals
Getting started
Browser guide
Fundamentals
Security
Corporate networks
Fundamentals
Security
Multi-factor authentication
Fundamentals
Security
Single sign-on
Fundamentals
Security
Lexer's ultimate troubleshooting guide
Fundamentals
Troubleshooting
Troubleshooting tech issues
Fundamentals
Troubleshooting
Troubleshooting integrations
Fundamentals
Troubleshooting
Troubleshooting Activate
Fundamentals
Troubleshooting
Troubleshooting Respond
Fundamentals
Troubleshooting
Help! My data is missing from the Hub
Fundamentals
Troubleshooting
Lexer data specification
Data Platform
Data Specification
Customer data specification
Data Platform
Data Specification
Commerce data specification
Data Platform
Data Specification
Marketing data specification
Data Platform
Data Specification
Compliance data specification
Data Platform
Data Specification
Data formatting and validation
Data Platform
Data Specification
Product imagery
Data Platform
Data Specification
Currency conversion
Data Platform
Data Specification
Dataset management
Data Platform
Data Management
Getting started with Logs
Data Platform
Data Management
Respond chatbot API
Data Platform
Developer APIs
Activity API
Data Platform
Developer APIs
Lexer Forms overview
Data Platform
Forms
Form builder workflow
Data Platform
Forms
Form conditional logic
Data Platform
Forms
Form settings
Data Platform
Forms
Form segmentation
Data Platform
Forms
Form response analysis
Data Platform
Forms
Form hidden fields
Data Platform
Forms
Form attribute mapping
Data Platform
Forms
Javascript Tag technical guide
Data Platform
Javascript Tag
Javascript Tag use cases
Data Platform
Javascript Tag
Javascript Tag: Shopify Custom Pixel
Data Platform
Javascript Tag
CRM data
Data Platform
Data Types
Transaction data
Data Platform
Data Types
Email engagement data
Data Platform
Data Types
Experian data enrichment
Data Platform
Data Types
Customer Service data
Data Platform
Data Specification
GDPR and CCPA requests
Fundamentals
Compliance
File upload API
Data Platform
Developer APIs
Segment overview
Insights
Segment
Creating segments
Insights
Segment
Smart Search
Insights
Segment
Profile tab
Insights
Segment
Event Explorer
Insights
Segment
Attribute value types
Data Platform
Data Types
Compare segments
Insights
Compare
Compare attributes
Insights
Compare
Activate overview
Engagement
Activate
Segment activations
Engagement
Activate
Event activations
Engagement
Activate
Activation field mapping
Engagement
Activate
Audience splits
Engagement
Activate
A/B splits
Engagement
Activate
Control group splits
Engagement
Activate
Inbox filtering
Engagement
Respond
Ignored Senders
Engagement
Respond
Forms in Respond
Engagement
Respond
Workflow states
Engagement
Respond
Bulk changes
Engagement
Respond
Scheduled replies
Engagement
Respond
Message templates
Engagement
Respond
Finding conversations
Engagement
Respond
Customer profiles
Engagement
Respond
Grouped messages
Engagement
Respond
Automation rules
Engagement
Respond
Redact messages
Engagement
Respond
Routing customer replies
Engagement
Respond
Interact with comments
Engagement
Respond
Respond workflow tips
Engagement
Respond
Contact Reporting
Engagement
Contact
Contact Queue Filters
Engagement
Contact
Contact WhatsApp Queues
Engagement
Contact
Schedule overview
Engagement
Schedule
Serve overview
Engagement
Serve
Installing Serve
Engagement
Serve
Serve user management
Engagement
Serve
Forms in Serve
Engagement
Serve
Configuring Serve
Engagement
Serve
Serve reports
Engagement
Serve
Serve Currency
Engagement
Serve
Serve POS QR Code
Engagement
Serve
Activate CSV export
Engagement
Activate
Track overview
Insights
Track
Track Tables
Insights
Track
Lexer Product Recommender → Klaviyo Integration
Grow
Lexer Product Recommender → Klaviyo Integration
Where do your best customers live? Use geographic insights to maximize media spend and conversion
Grow
Where do your best customers live? Use geographic insights to maximize media spend and conversion
Suppression audiences: How to use them and maximize your media effectiveness
Enrich
Suppression audiences: How to use them and maximize your media effectiveness
Measure campaign effectiveness: Track your KPIs and report on campaign performance
Enrich
Measure campaign effectiveness: Track your KPIs and report on campaign performance
Converting prospects: Acquire customers from your current, engaged prospect base (and save your ad spend!)
Acquire
Converting prospects: Acquire customers from your current, engaged prospect base (and save your ad spend!)
Understanding customer intent: Effective message personalization for buyers who shop outside their gender
Grow
Understanding customer intent: Effective message personalization for buyers who shop outside their gender
Deepen customer understanding: Use third-party data to identify insights for personalized messaging and increased engagement
Grow
Deepen customer understanding: Use third-party data to identify insights for personalized messaging and increased engagement
Basket analysis: Increasing customer lifetime value through targeted product bundling
Grow
Basket analysis: Increasing customer lifetime value through targeted product bundling
Persona building: Personalize your messaging for maximum return on your campaign investments
Grow
Persona building: Personalize your messaging for maximum return on your campaign investments
Driving customer loyalty: Identify, keep, and grow your most loyal customers
Retain
Driving customer loyalty: Identify, keep, and grow your most loyal customers
Increasing customer lifetime value: Upselling strategies
Grow
Increasing customer lifetime value: Upselling strategies
Reactivating lapsed customers: Retention and growth
Retain
Reactivating lapsed customers: Retention and growth
Welcome offer strategies: Using the Hub for analysis
Acquire
Welcome offer strategies: Using the Hub for analysis
Creating customer forms: Best practices
Enrich
Creating customer forms: Best practices
Connecting your MarTech stack with Lexer!
Enrich
Connecting your MarTech stack with Lexer!
Identifying win-back customers using the Second Last Order: Date attribute
Retain
Identifying win-back customers using the Second Last Order: Date attribute
Understanding customers' purchasing habits: Order Sequence filter
Grow
Understanding customers' purchasing habits: Order Sequence filter
Maximizing customer communications in Respond
Retain
Maximizing customer communications in Respond
Using Product Recommendation in the Hub to encourage repeat purchases
Grow
Using Product Recommendation in the Hub to encourage repeat purchases
Encouraging customer loyalty using birthday campaigns
Retain
Encouraging customer loyalty using birthday campaigns
Increasing customer lifetime value: Converting your one-time buyers into two-time buyers
Grow
Increasing customer lifetime value: Converting your one-time buyers into two-time buyers
Collecting zero-party data using Lexer Forms
Enrich
Collecting zero-party data using Lexer Forms
Measuring your data over time with Track
Enrich
Measuring your data over time with Track
UTM Creation Guidelines
UTM Creation Guidelines
UTM Creation Guidelines
Brand Assets
Brand Assets
Brand Assets
GTM guide to creating marketing materials with Lexer
GTM guide to creating marketing materials with Lexer
GTM guide to creating marketing materials with Lexer
How to use the Lexer Form Builder
How to use the Lexer Form Builder
How to use the Lexer Form Builder
Explore our standard integrations
Explore our standard integrations
Explore our standard integrations
Operating a CDP Practice LP
Operating a CDP Practice LP
Operating a CDP Practice LP
LP Sales Enablement
LP Sales Enablement
LP Sales Enablement
Video Training Courses LP
Video Training Courses LP
Video Training Courses LP
Purpose and Mission
Purpose and Mission
Purpose and Mission
Brand & Marketing LP
Brand & Marketing LP
Brand & Marketing LP
Stage 3 - QBRs and Customer insights
Stage 3 - QBRs and Customer insights
Stage 3 - QBRs and Customer insights
Client Onboarding Journey
Client Onboarding Journey
Client Onboarding Journey
Start your CDP Practice
Start your CDP Practice
Start your CDP Practice
Lexer Messaging
Lexer Messaging
Lexer Messaging
Pain point conversations
Pain point conversations
Pain point conversations
Lexer Ideal Customer Profile & Positioning
Lexer Ideal Customer Profile & Positioning
Lexer Ideal Customer Profile & Positioning
Referral Partner Program Overview
Referral Partner Program Overview
Referral Partner Program Overview
Demo Hub Guide
Demo Hub Guide
Demo Hub Guide