Customer Data Integration

Why Integrate Customer Data?

It is very common that organizations interface with customers through multiple platforms/channels (stores, direct/online, contact centers, special events like sweepstakes etc.) and each platform may have its own operational systems (OLTPs). Stores/retail businesses generally uses ‘Point of Sales’ applications where as direct/online brands use e-commerce platforms like ATG, DAX, Ecommetry etc. On the other side, special events like sweepstakes may collect customer information using paper sign-ups which may get entered in some sort of spreadsheet for later use.  So organizations usually collect a lot of information about customers like personalization information (name, address, phone etc.), contact preferences (mail opt-in/out, email opt-in/out, text opt-in/out, call opt-in/out etc.), interests (products that customers like) through most of these platforms during account setup, order checkout process, or newsletter sign-up. So the information is scattered across multiple heterogeneous systems and there will be lot of redundancy as same person may have customer records with different database keys across these systems.

Business users, especially marketing users, always want to know answers for the questions below in order to come up with best marketing strategies to keep business growing , attract more customers, retain customers, serve customers the best possible way and reduce customer attrition.

  1. How many unique customer are we dealing with?
  2. What is the customer golden record if same customer happens to exist across multiple platforms/source systems with different database keys?
  3. Where do we have latest customer contact preferences/subscription information?
  4. Where do we have latest customer interests?
  5. What is the last product customer purchased through any channel?
  6. Is the customer who signed-up through paper sign-up truly a new customer or an existing customer?
  7. How do we identify first time buyers in order to send them ‘thank you’ coupon?
  8. How do we identify if a customer is at risk or attrited (not bought anything quite for some time)?

To answer all these questions, we can definitely say that we need to bring customers information and their related activities like sales,interest selections, and online browsing behavior from multiple heterogeneous source systems into one place and combine them to provide a unified view of this data to business users.  This can’t be done just by a well designed data warehouse with great details and summaries. It requires a lot of data consolidation effort, that’s where customer data integration comes into picture.

What is Customer Data Integration?

Customer Data Integration is the process of cleansing, standardizing, validating, matching and merging customer data that is scattered across multiple heterogeneous systems and create one consolidated record for each customer with the most trusted, valid and latest golden informationof the customer which in turn can be used across the organization with no confusion.

  • Cleansing & Standardization

We see lot of junk characters, extra spaces, typo, unknowns, and ‘NA’ values etc in OLTP systems. No matter whether it is intentional or mistake, we got to clean these values and standardize before we integrate customer data in order to have 99.999% perfect data integration. I would like to present the below example to explain it in a better way. If a customer ‘Bob Smith’ has below two records in operational systems, we will not be able to consolidate them without cleansing and standardization.

Records in OLTP systems:

CUST ACCT # Name Address Ph# Email


Bob Smith 1350 Maricopa St, Torrance, CA, 91386 818-234-5678


Robert  D Smith 1350 Mareecoopa St, Torance, CA 818-234-5678

Look at these two records carefully. If we do string comparison, we will not able to match any attribute except phone number. People who work in the same office often share the same phone number, so these two customers obviously will be considered as two distinct customers if we don’t have proper customer data integration system in place. But if we give second look to this data, we can notice that both records belong to a customer ‘Bob Smith’. Just his first name is mentioned as Robert (Bob is short name for Robert) in second record, and his middle initial is missing in first record He has used his unique email on each record/account, but his mailing address and phone number  are matching (except some typo). With proper data integration systems (popularly known as MDM-Master Data Management) with sophisticated address and name dictionaries in place, these two records can be cleansed, standardized and converted as specified below.

Records after Cleansing & Standardization:

CUST# Name Address Ph# Email


Bob Smith 1350 Maricopa St, Torrance, CA, 91386-1257 818-234-5678


Bob Smith 1350 Maricopa St, Torrance, CA, 91386-1257 818-234-5678
  • Match and Merge

This is the process of matching and merging the cleansed/standardized data and creating one consolidated golden record for each customer. If we look at above example, we can easily say that both these records belong to same customer ‘Bob Smith’ and any MDM system can programmatically match and merge these two records and create one consolidated record. But question arises when it comes to email address, what email address wins? This is where the trust scores and decay play major role. Basically MDM experts conduct series of meetings with business teams and subject matter experts to identify best source system for each attribute and assign trust scores and decay accordingly. Usually customers are required to provide an email address when they buy something online where as it is not mandatory to provide an email when buying something at store, so always e-commerce systems will be given more trust scores over the ‘point of sales’ systems. And also the decay period will be defined for each system (Typically will be set as 24 months) that tells that a record from a particular source system will be given less priority over others if record is not updated in source system for defined decay period.

Recency (source record update timestamp) plays a major role in the merging process to retain the latest information on golden records. One wonders what happens to another e-mail, as another e-mail would also be retained in MDM as a secondary email for that customer. This process of assigning scores and decay holds true for each attribute like customer primary genre, contact preferences, phone, mailing address etc. And each consolidated record will be assigned with a unique party_id and a cross reference will be maintained between source system IDs and party_id which can be used to tie sales and other customer activities with the party (consolidated customer record) in Data Warehousing.

We have quite a few MDM vendors available in market. IBM initiate is a very popular and proven system in Master Data Management. Though we didn’t cover areas other than customers in this blog, we can do much more in MDM like product integration (Popularly known as PIM- Product Information System) etc.

Benefits of Customer Data Integration

  1. Helps to build better customer analytics
  2. Helps to indentify ‘At Risk’ & ‘Attrited’ customers and build best marketing strategies to retain customers
  3. Helps to indentify frequent buyers and serve them by building trend based and targeted marketing campaigns
  4. Helps to identify first time buyers and new registrants/email sign-ups and send them welcome and thank you coupons
  5. Helps to identify the last product customer bought and recommend customer most relevant ‘Next Products To Buy’ with marketing offers/promotions
  6. Helps to build single truth of customer contact preferences and avoid spam complaints or black listing issues with email marketing that may cause because of customer redundancy in source systems.
  7. Saves lot of money and effort in marketing by preventing sending duplicate emails to same customer/party
  8. In one word, it helps to build better and most efficient Customer Relationship Management system.