Dataset management

Lexer’s Datasets gives you the ability to explore data and high-level statistics from each of your data sources including integrated accounts, directly in the Hub.

Each of your datasets contain record types specific to each integrated account that has data flowing into the Hub. These have been transformed and organised into datasets that follow Lexer’s standard schema.

For example, Shopify datasets will contain record types related to Shopify online transactions, including customer records, products, orders, and returns. Whereas Klaviyo datasets will contain record types relating to email events, including customer, emails sent, emails clicked, subscribed, etc.

Organizing data into a dataset

In the diagram below, we have used Shopify as an example to illustrate how data makes its way into a dataset from the original Shopify source. You are then able to view this dataset within the Lexer Hub.

  1. Shopify connects with Lexer via an API.
  2. The raw data from Shopify is loaded into an Amazon S3 bucket via a Dataset load.
  3. The data is then transformed into Lexer's data schema via a Dataflow.
  4. It’s then arranged into a dataset that can be accessed in Datasets.
Something doesn't look right?

It's important to know how your data is loaded into the Hub to understand what's expected and where we might need to step in.

  • You might notice a high volume of records processed in the past day or two compared with the prior period. This can happen when all records in your dataset are reloaded and the processed date for all records changes.
  • If some of your data looks like it wasn't uploaded, it might have been pushed to the next days queue.
  • The "last run date" will be the most recent time the dataset was reprocessed, not the first.
  • It can take around 24 hours for the dataset to become visible in the Hub. If you notice this is taking over 24 hours please reach out to Lexer Support using the chatbot in the bottom right of the page.

Datasets and Logs

Logs is another awesome tool that works hand in hand with Datasets. Its really helpful to check out logs for progress if you have uploaded data to the Hub as jobs in the Hub will run in a specific order that you can see in Logs. This order can be a great help in identifying if and where something has gone wrong. To learn more about Logs, check out the article here.  

So, how can I use datasets?

Having your datasets easily accessible within the Hub has a number of benefits and uses. We've outlined some common use cases below.

Data QA and validation

Lexer’s Datasets lets you quickly view and validate the health of your integration feed, including data continuity, volume, and freshness of data flowing into your datasets. Drilling down even further, you can click into each record to view the payload for individual customer records.

Lexer tip!
The permission to view datasets is managed by your account administrator. If you'd like to gain access you'll need to be added to the "Access: Datasets" group. You can find details about how to edit user permissions here. Alternatively you can reach out to our Support team using the chat in the bottom right of the screen.

Dataset statistics

Within Datasets you can also access high-level statistics and charts separated into convenient tabs, which allow you to quickly visualize important metrics for each dataset. These will vary depending on the type of records contained within the dataset.

In the examples below we can see two different datasets with different types of event data:

  • Transactional event data: A Shopify dataset that contains ecommerce transactional data with the Ecommerce Purchase tab open.
  • Campaign event data: A Klaviyo dataset that contains campaign event data with the Email Bounce tab open.

As we continue to build out our API capabilities, Datasets will also give you the ability to create your own datasets, which can be written into directly using JSON and CSV uploads!

Need Personal Identifiable Information (PII) to be hidden for some users?

Not to worry, we've got you covered. Users can be provided access to the high level statistics in Datasets without gaining access to any PII. We have added an "Access: Datasets" group to your Hub that provides access to the high level view of Datasets. Users can be added to this group and gain access to datasets, while being unable to see PII. If you want a user to have access you can add them to another group in your hub that has PII access.

Finding your datasets in the Hub

All Hubs will be given access to Datasets! Navigating this tool and understanding each different section is easy!

  1. You can view your datasets in the Hub by navigating to Manage > Datasets in the top navigation bar.
  2. All of your datasets will be listed in the left-side panel. You’ll be able to see the name and a brief description of the dataset, the status and time of the last job load.

Click on a dataset in this panel to open the detailed dataset view in the main window. You can also click on the View button in the top right-hand corner of the screen to see more details about your dataset and the jobs that have run.

The Details tab contains the dataset Name, Description, Dataset ID  and the Dataset Type. The dataset ID can be an important requirement when using some of our APIs.

The Jobs tab is especially useful because you’ll be able to see a history of your dataset. This includes: 

  • Status: The status of the last job load. Did it run successfully, did it fail, or is it still pending?
  • Started at: When the run job started (dates and times are displayed in your local timezone).
  • Time taken: How long the job took to run.
  • Record types: Which record types were updated.

Adding new datasets

Adding new datasets can be a critical first step when using some of our APIs. This process will create a blank dataset, a perfect spot for the data you intend to send via an API!

To add a new dataset: 

  1. Navigate to Manage > Datasets
  2. Click on New Dataset.  
  1. Fill in the Name and Description.
  2. Click Save Dataset.
  1. Now that you have created the dataset, open it up by clicking View.
  1. Record the Dataset ID

You now have an empty dataset, ready to go!

Clear and Load data

Within the dataset view there are two important buttons in the top right of the page, Clear Data and Load Data.

  • Clear Data: Starts a job that will remove data from the dataset, the dataset itself wont be deleted. This may take up an hour and can't be undone.
  • Load Data: Begins a job that will load new profile data to your dataset. This may take up an hour. Once you start you can't re-run it until finished.
These actions can't be reversed
Please take care with the clear and load functions. If data is removed in the process, it cannot be restored through the Hub.

The job view

To find out more information about each individual job, you can click on the row it belongs to, which will bring up the Job View panel.

Towards the bottom of the panel you’ll see a section called Stats. The table in this section displays a list of the record types that were updated, including:

  • Record type: The type of data received.
  • Total Records: The sum of New Records + Updated Records.
  • New Records: All new records that will be loaded to the CDE.
  • Updated Records: All existing records that have been updated.
  • Rejected Records: All records that have been rejected and are not a part of the Total Records count.

Record tabs

We can then collapse these panels and move back to the main view where you will see a list of record types for the selected dataset. These form the basis of the dataset you are viewing.

The example below shows a list of Customer records in a Shopify dataset. You can view other record types within the dataset by selecting a different dataset record type from the tabs along the top.

Click on a record to view detailed payload information.

Use the date picker at the top of the page to change the timelines on the data you wish to view.

Lexer tip!
When you first access the Dataset Manager, the date range will default to the “Last 7 days”.

Dataset metrics and statistics

Relevant, top-line metrics for the selected record type are presented at the top of the main window, along with a chart that presents a view of high-level metrics relevant to the record type within the dataset across the date range selected.

That's a wrap!

In this article we ran through what Lexer's datasets product is and how to use it for data validation or statistical analysis of your datasets. Datasets has some cool new updates in store so watch this space!

Updated:
April 26, 2025
Did this page help you?
Thank you! Your feedback has been received!
Oops! Something went wrong while submitting the form, for assistance please contact support@lexer.io
Welcome to Lexer!
Fundamentals
Getting started
Javascript Tag basics
Data Platform
Javascript Tag
Data in the CDXP
Fundamentals
Getting started
Our glossary
Fundamentals
Getting started
Integrations overview
Fundamentals
Integrations
Onboarding data with Lexer
Data Platform
Data Onboarding
Data onboarding process
Data Platform
Data Onboarding
Importing CSV data
Data Platform
Data Onboarding
Importing JSON data
Data Platform
Data Onboarding
Secure file uploads
Data Platform
Data Onboarding
SFTP uploads and exports
Data Platform
Data Onboarding
AWS S3 uploads and exports
Data Platform
Data Onboarding
Lexer's attributes
Data Platform
Data Types
Predictive attributes
Data Platform
Data Types
Lexer API overview
Data Platform
Developer APIs
Understanding APIs
Data Platform
Developer APIs
API authentication
Data Platform
Developer APIs
API rate limits
Data Platform
Developer APIs
Bulk write API
Data Platform
Developer APIs
Profile read API
Data Platform
Developer APIs
Activity overview
Insights
Activity
Team report
Insights
Activity
Cases report
Insights
Activity
NPS report
Insights
Activity
Listen overview
Insights
Listen
Searching in Listen
Insights
Listen
Tier filters
Insights
Listen
Saved dives
Insights
Listen
Boolean search
Insights
Listen
Listen CSV exports
Insights
Listen
Visualize overview
Insights
Visualize
Curate image feed
Insights
Visualize
Respond overview
Engagement
Respond
Identity Resolution
Fundamentals
Getting started
My account
Fundamentals
Setup
Manage team
Fundamentals
Setup
Group permissions
Fundamentals
Setup
Classifications
Fundamentals
Setup
Out of the box segments
Fundamentals
Setup
Lexi your AI companion
Fundamentals
Getting started
Browser guide
Fundamentals
Security
Corporate networks
Fundamentals
Security
Multi-factor authentication
Fundamentals
Security
Single sign-on
Fundamentals
Security
Lexer's ultimate troubleshooting guide
Fundamentals
Troubleshooting
Troubleshooting tech issues
Fundamentals
Troubleshooting
Troubleshooting integrations
Fundamentals
Troubleshooting
Troubleshooting Activate
Fundamentals
Troubleshooting
Troubleshooting Respond
Fundamentals
Troubleshooting
Help! My data is missing from the Hub
Fundamentals
Troubleshooting
Lexer data specification
Data Platform
Data Specification
Customer data specification
Data Platform
Data Specification
Commerce data specification
Data Platform
Data Specification
Marketing data specification
Data Platform
Data Specification
Compliance data specification
Data Platform
Data Specification
Data formatting and validation
Data Platform
Data Specification
Product imagery
Data Platform
Data Specification
Currency conversion
Data Platform
Data Specification
Dataset management
Data Platform
Data Management
Getting started with Logs
Data Platform
Data Management
Respond chatbot API
Data Platform
Developer APIs
Activity API
Data Platform
Developer APIs
Lexer Forms overview
Data Platform
Forms
Form builder workflow
Data Platform
Forms
Form conditional logic
Data Platform
Forms
Form settings
Data Platform
Forms
Form segmentation
Data Platform
Forms
Form response analysis
Data Platform
Forms
Hidden form fields
Data Platform
Forms
Javascript Tag technical guide
Data Platform
Javascript Tag
Javascript Tag use cases
Data Platform
Javascript Tag
Javascript Tag: Shopify Custom Pixel
Data Platform
Javascript Tag
CRM data
Data Platform
Data Types
Transaction data
Data Platform
Data Types
Email engagement data
Data Platform
Data Types
Experian data enrichment
Data Platform
Data Types
Customer Service data
Data Platform
Data Specification
GDPR and CCPA requests
Fundamentals
Compliance
File upload API
Data Platform
Developer APIs
Segment overview
Insights
Segment
Creating segments
Insights
Segment
Smart Search
Insights
Segment
Profile tab
Insights
Segment
Event Explorer
Insights
Segment
Attribute value types
Data Platform
Data Types
Compare segments
Insights
Compare
Compare attributes
Insights
Compare
Activate overview
Engagement
Activate
Segment activations
Engagement
Activate
Event activations
Engagement
Activate
Activation field mapping
Engagement
Activate
Audience splits
Engagement
Activate
A/B splits
Engagement
Activate
Control group splits
Engagement
Activate
Inbox filtering
Engagement
Respond
Ignored Senders
Engagement
Respond
Forms in Respond
Engagement
Respond
Workflow states
Engagement
Respond
Bulk changes
Engagement
Respond
Scheduled replies
Engagement
Respond
Message templates
Engagement
Respond
Finding conversations
Engagement
Respond
Customer profiles
Engagement
Respond
Grouped messages
Engagement
Respond
Automation rules
Engagement
Respond
Redact messages
Engagement
Respond
Routing customer replies
Engagement
Respond
Interact with comments
Engagement
Respond
Respond workflow tips
Engagement
Respond
Contact Reporting
Engagement
Contact
Contact Queue Filters
Engagement
Contact
Contact WhatsApp Queues
Engagement
Contact
Schedule overview
Engagement
Schedule
Serve overview
Engagement
Serve
Installing Serve
Engagement
Serve
Serve user management
Engagement
Serve
Forms in Serve
Engagement
Serve
Configuring Serve
Engagement
Serve
Serve reports
Engagement
Serve
Serve Currency
Engagement
Serve
Serve POS QR Code
Engagement
Serve
Activate CSV export
Engagement
Activate
Track overview
Insights
Track
Track Tables
Insights
Track
Report overview
Measure
Listen
Lexer Product Recommender → Klaviyo Integration
Grow
Lexer Product Recommender → Klaviyo Integration
Where do your best customers live? Use geographic insights to maximize media spend and conversion
Grow
Where do your best customers live? Use geographic insights to maximize media spend and conversion
Suppression audiences: How to use them and maximize your media effectiveness
Enrich
Suppression audiences: How to use them and maximize your media effectiveness
Measure campaign effectiveness: Track your KPIs and report on campaign performance
Enrich
Measure campaign effectiveness: Track your KPIs and report on campaign performance
Converting prospects: Acquire customers from your current, engaged prospect base (and save your ad spend!)
Acquire
Converting prospects: Acquire customers from your current, engaged prospect base (and save your ad spend!)
Understanding customer intent: Effective message personalization for buyers who shop outside their gender
Grow
Understanding customer intent: Effective message personalization for buyers who shop outside their gender
Deepen customer understanding: Use third-party data to identify insights for personalized messaging and increased engagement
Grow
Deepen customer understanding: Use third-party data to identify insights for personalized messaging and increased engagement
Basket analysis: Increasing customer lifetime value through targeted product bundling
Grow
Basket analysis: Increasing customer lifetime value through targeted product bundling
Persona building: Personalize your messaging for maximum return on your campaign investments
Grow
Persona building: Personalize your messaging for maximum return on your campaign investments
Driving customer loyalty: Identify, keep, and grow your most loyal customers
Retain
Driving customer loyalty: Identify, keep, and grow your most loyal customers
Increasing customer lifetime value: Upselling strategies
Grow
Increasing customer lifetime value: Upselling strategies
Reactivating lapsed customers: Retention and growth
Retain
Reactivating lapsed customers: Retention and growth
Welcome offer strategies: Using the Hub for analysis
Acquire
Welcome offer strategies: Using the Hub for analysis
Creating customer forms: Best practices
Enrich
Creating customer forms: Best practices
Connecting your MarTech stack with Lexer!
Enrich
Connecting your MarTech stack with Lexer!
Identifying win-back customers using the Second Last Order: Date attribute
Retain
Identifying win-back customers using the Second Last Order: Date attribute
Understanding customers' purchasing habits: Order Sequence filter
Grow
Understanding customers' purchasing habits: Order Sequence filter
Maximizing customer communications in Respond
Retain
Maximizing customer communications in Respond
Using Product Recommendation in the Hub to encourage repeat purchases
Grow
Using Product Recommendation in the Hub to encourage repeat purchases
Encouraging customer loyalty using birthday campaigns
Retain
Encouraging customer loyalty using birthday campaigns
Increasing customer lifetime value: Converting your one-time buyers into two-time buyers
Grow
Increasing customer lifetime value: Converting your one-time buyers into two-time buyers
Collecting zero-party data using Lexer Forms
Enrich
Collecting zero-party data using Lexer Forms
Measuring your data over time with Track
Enrich
Measuring your data over time with Track
UTM Creation Guidelines
UTM Creation Guidelines
UTM Creation Guidelines
Brand Assets
Brand Assets
Brand Assets
GTM guide to creating marketing materials with Lexer
GTM guide to creating marketing materials with Lexer
GTM guide to creating marketing materials with Lexer
How to use the Lexer Form Builder
How to use the Lexer Form Builder
How to use the Lexer Form Builder
Explore our standard integrations
Explore our standard integrations
Explore our standard integrations
Operating a CDP Practice LP
Operating a CDP Practice LP
Operating a CDP Practice LP
LP Sales Enablement
LP Sales Enablement
LP Sales Enablement
Video Training Courses LP
Video Training Courses LP
Video Training Courses LP
Purpose and Mission
Purpose and Mission
Purpose and Mission
Brand & Marketing LP
Brand & Marketing LP
Brand & Marketing LP
Stage 3 - QBRs and Customer insights
Stage 3 - QBRs and Customer insights
Stage 3 - QBRs and Customer insights
Client Onboarding Journey
Client Onboarding Journey
Client Onboarding Journey
Start your CDP Practice
Start your CDP Practice
Start your CDP Practice
Lexer Messaging
Lexer Messaging
Lexer Messaging
Pain point conversations
Pain point conversations
Pain point conversations
Lexer Ideal Customer Profile & Positioning
Lexer Ideal Customer Profile & Positioning
Lexer Ideal Customer Profile & Positioning
Referral Partner Program Overview
Referral Partner Program Overview
Referral Partner Program Overview
Demo Hub Guide
Demo Hub Guide
Demo Hub Guide