Data formatting and validation

Whether interacting with Lexer’s APIs, or simply sending data via SFTP or CSV upload, ensuring your data is well-formatted and validated is a sure-fire way to optimize your CDXP experience.


Address data can vary greatly depending on the regions your data represents. We have a very open definition of what valid address data looks like to ensure all valid addresses can be used.

However, good address data can have a significant impact on your team’s use of the CDXP, particularly if your use cases include location-based segmentation and insights. It’s recommended that you consider how easy to use your address data is. For example, are your states in a consistent format? I.e, VIC versus Victoria; NY versus New York.

Depending on the configuration of your CDXP unification policies, customers’ address data might be used to unify against 1st, 2nd, or 3rd-party data. In doing so, geocoding services may be used to prepare data for unification. You may want to consider validating your address data using one of the many geocoding services, such as OpenStreet Map, to minimize issues with your customer data.

Currency and financial data

Currency and financial data is accepted as decimals without symbol suffixes or prefixes, i.e. “12.99”, “-100.00”.

Where a currency can be provided, it should be in an ISO-4217 format, i.e. “AUD” or “USD”. The case of the code does not matter.

For information on our currency conversion capabilities for multi-currency data, please contact us.


All dates and timestamps are ISO-8601 date strings.

If you are providing a time portion it's strongly recommended that you provide the timezone in the string, otherwise a UTC timezone will be presumed

Email addresses and email hashing

Email addresses are commonly used for unification and activation throughout your CDXP and integrated platforms. While the Lexer CDXP has a loose definition of an email, we recommend checking your email addresses against this regex:


To ensure the highest matching success, we lowercase all email data, and may reject malformed email addresses.

*Note that downstream platforms such as ESPs and Social Ad Networks can have strict policies on what is considered a valid email. This might cause profiles that don’t match their own policies to be rejected.


We hash email data internally, as many activation platforms only accept hashed email data.

If you choose to provide hashed (SHA-256) email data, you should:

  • Remove leading/trailing whitespaces.
  • Convert the text to lowercase.


We accept any string value as a representation of genders. This format can allow you to use any genders, the key is to keep the format consistent. 

It’s worth considering that consistent gender data can have a significant impact on your team’s use of the CDXP. For example, you may want to clean your data to ensure genders are in a consistent format, i.e. “Female” versus “F”. 

Phone numbers

Phone numbers, mobile numbers, or cell numbers are commonly used for unification and activation throughout the CDXP and integrated platforms.

You should convert each phone number, mobile number, or cell number to E164 format. This format represents a phone number as a number up to fifteen digits, starting with a + sign (e.g. +12125650000, +442070313000).

*Note: Downstream platforms such as SMS platforms and Social Ad Networks can have strict policies on what is considered a valid number. If their required format is not used, they may reject profiles.

Multi-lingual text data

The Lexer platform supports multi-byte data in a UTF-8 format. You will not need to convert or translate data formatted as such, but note that the Lexer platform does not do any language conversion. Data will be presented in the language and format provided.

February 1, 2024
Did this page help you?
Thank you! Your feedback has been received!
Oops! Something went wrong while submitting the form, for assistance please contact