Data Management, Industry-agnostic

Managing Data Load and Migration for a Master Data Management (MDM) Project

January 17th, 2023 WRITTEN BY Marc A.Paolo Sr Director, Client Success Tags: , , ,

Written by Marc A. Paolo, Sr. Director, Client Success HIPAA Privacy and Compliance Officer

Data load and migration are an integral and central part of any Master Data Management (MDM) implementation.

Most MDM projects involve two types of data loads:

  • Initial Data Load (IDL)
  • Ongoing Data Loads (ODL)

Data loads occur in various frequencies, which can include:

  • One-time – often an IDL
  • Batch
  • Real-time

The method of the data load must also be considered.  Possible methods include:

  • Direct SQL load (available in some, but not all, MDM systems)
  • Flat file upload facilitated by the MDM system’s built-in tools
  • Load via API
  • Load by data transport, middleware, or ETL tool

Once the source data has been identified, often with the assistance of client Subject Matter Experts (SMEs), we begin with a data modeling and mapping exercise to ensure the source data will fit the target system. In many cases, is necessary to understand data volumes, amount of duplication, invalid data in fields, missing data in required fields, and other data anomalies and characteristics; this informs the modeling and mapping results and is used in other aspects of MDM implementation, such as for developing match rules and understanding enterprise reference data.

Through the initial modeling and mapping exercise, we also take care to understand the level of confidentiality of the data. If PHI and PII are involved, then we ensure only authorized personnel are allowed to handle the data, and the data must only be handled on client systems. Our consultants are trained to prevent data breaches during data loads.

Mapping considerations often require the assistance of client subject matter experts; in many (most) cases, the data models from the source and target systems do not match, and judgment and experience are required to ensure all data from the source has an appropriate landing place in the target system. In many cases, data transformations are also required, and these must be understood and designed into the load process.

Once modeling and mapping are complete, we conduct a small initial data load using a fractional subset of the initial data load set. The initial data load is used to validate the assumptions and design decisions made in the modeling and mapping exercise. The sample data load also is used to confirm connectivity between the source and target, where applicable.

The following are examples of the many things our data teams will validate upon load:

  • Record counts must match, and failed records must be accounted for, with reasons for failure understood.
  • Data types must be preserved or converted as expected.
  • Lists of values must load or map correctly.
  • Data integrity must be preserved on a field-by-field basis.
  • Field lengths must be validated to avoid truncating data.
  • Fields must be mapped correctly between the source and target.
  • There must be no corruption due to incompatible character sets or other issues introduced by the transport method.
  • NULL vs. blank characters must be handled correctly.
  • Date/Time fields must be loaded correctly – either with or without conversion, as needed.
  • Examination of failures (or unintended successes) due to required fields and other validation rules.
  • Time stamps and other audit information (such as “created by” fields) must be preserved or recorded correctly.

As a rule, Fresh Gravity will automate this process whenever possible and permissible.

Upon validating that the sample data load worked correctly, a full data set will be loaded, and the same items will be examined as with the sample load, plus any issues due to volume will also be evaluated.

The process is similar for Ongoing Data Loads (ODL), however, additional considerations come into play, such as:

  • Desired behavior when a record is updated in full or in part
  • Desired behavior when new records are inserted
  • Source of the ongoing updates can often be different than the source of the initial data load, and these differences must be handled carefully
  • Desired behavior when records are hard deleted or marked for deletion/inactivation in the source system

If you need help with a Data Load for your MDM project, please write to us at marc.paolo@freshgravity.com or sudarsana.roychoudhury@freshgravity.com.

Share this

Explore More Blogs

The Role of Ethical Practices in Strengthening Client Trust

By Monalisa Thakur, Sr. Manager, Client Success The Currency of Trust  Imagine you walk into a coffee shop every morning, greeted by the same barista who knows your order by heart. One day, you realize they’ve been charging you a little extra without telling you. At first, you wonder if it was a mistake, but […]

Turning Complexity into Clarity: Organizing Unstructured Data Effectively 

Written by Neha Sharma, Sr. Manager, Data Management In this age of information, organizations are inundated with data from countless sources – social media, emails, customer feedback, IoT devices, and much more. While this abundance of data holds immense potential, much of it is unstructured, making it challenging to analyze and leverage for decision-making. Organizing […]

Streamlining Databricks Deployments with Databricks Asset Bundles (DABs) and GitLab CI/CD 

Written by Atharva Shrivas, Consultant, Data Management and Ashutosh Yesekar, Consultant, Data Management As data engineering and analytics pipelines become more complex, organizations need efficient ways to manage deployments and enhance collaboration. Traditional approaches often involve redundant code, scattered dependencies, and inconsistent environments.   Databricks Asset Bundles (DABs) provide a structured, streamlined way to package, […]

Social media & sharing icons powered by UltimatelySocial