Data Management, Industry-agnostic

Managing Data Load and Migration for a Master Data Management (MDM) Project

January 17th, 2023 WRITTEN BY Marc A.Paolo Sr Director, Client Success Tags: , , ,

Written By Marc A.Paolo Sr Director, Client Success

Data load and migration are an integral and central part of any Master Data Management (MDM) implementation.

Most MDM projects involve two types of data loads:

  • Initial Data Load (IDL)
  • Ongoing Data Loads (ODL)

Data loads occur in various frequencies, which can include:

  • One-time – often an IDL
  • Batch
  • Real time

The method of the data load must also be considered.  Possible methods include:

  • Direct SQL load (available in some, but not all, MDM systems)
  • Flat file upload facilitated by MDM system’s built-in tools
  • Load via API
  • Load by data transport, middleware, or ETL tool.

Once the source data has been identified, often with the assistance of client Subject Matter Experts (SMEs), we begin with a data modelling and mapping exercise to ensure the source data will fit the target system.  In many cases, is necessary to understand data volumes, amount of duplication, invalid data in fields, missing data in required fields, and other data anomalies and characteristics; this informs the modelling and mapping results and is used in other aspects of MDM implementation, such as for developing match rules and understanding enterprise reference data.

Through the initial modelling and mapping exercise, we also take care to understand the level of confidentiality of the data.  If PHI and PII are involved, then we ensure only authorized personnel are allowed to handle the data, and the data must only be handled on client systems.  Our consultants are trained to prevent data breaches during data loads.

Mapping considerations often require the assistance of client subject matter experts; in many (most) cases, the data models from the source and target systems do not match, and judgment and experience are required to ensure all data from the source has an appropriate landing place in the target system.  In many cases, data transformations are also required, and these must be understood and designed into the load process.

Once modelling and mapping are complete, we conduct a small initial data load using a fractional subset of the initial data load set. The initial data load is used to validate the assumptions and design decisions made in the modelling and mapping exercise. The sample data load also is used to confirm connectivity between the source and target, where applicable.

The following are examples of the many things our data teams will validate upon load:

  • Record counts must match, and failed records must be accounted for, with reasons for failure understood.
  • Data types must be preserved or converted as expected.
  • Lists of values must load or map correctly.
  • Data integrity must be preserved on a field-by-field basis.
  • Field lengths must be validated to avoid truncating data.
  • Fields must be mapped correctly between the source and target.
  • There must be no corruption due to incompatible character sets or other issues introduced by the transport method.
  • NULL vs. blank characters must be handled correctly.
  • Date/Time fields must be loaded correctly – either with or without conversion, as needed.
  • Examination of failures (or unintended successes) due to required fields and other validation rules.
  • Time stamps and other audit information (such as “created by” fields) must be preserved or recorded correctly.

As a rule, Fresh Gravity will automate this process whenever possible and permissible.

Upon validating the sample data load worked correctly, a full data set will be loaded, and the same items will be examined as with the sample load, plus any issues due to volume will also be evaluated.

The process is similar for Ongoing Data Loads (ODL), however, additional considerations come into play, such as:

  • Desired behaviour when a record is updated in full or in part.
  • Desired behaviour when new records are inserted.
  • Source of the ongoing updates can often be different than the source of the initial data load, and these differences must be handled carefully.
  • Desired behaviour when records are hard deleted or marked for deletion/inactivation in the source system.

If you need help with a Data Load for your MDM project, please write to us at marc.paolo@freshgravity.com or sudarsana.roychoudhury@freshgravity.com.

Share this

LinkedIn
Share
Copy link

Explore More Blogs

Microsoft Fabric – A Unified View of the Modern Data Landscape

Written By Siddharth Mohanty, Sr. Manager, Data Management Stepping into The Future With AI  The future is AI.   From easy-to-use copilot experiences to custom generative AI solutions, every organization today is exploring how they can best utilize AI. However, as businesses get ready for an AI-powered future, they will also require clean data to power […]

Enhance Your Organization’s Productivity with Data and Technology

Written By Neha Sharma, Sr. Manager, Data Management In today’s fast-paced and dynamic business landscape, staying ahead of the curve requires more than just traditional methods. Organizations must adapt to the digital age by leveraging the power of data and technology to enhance productivity and drive growth. Whether you’re a small startup or a multinational […]

Implementing CI/CD in Microsoft Fabric: A Comprehensive Guide

Written By Ashutosh Yesekar, Consultant, Data Management In the rapidly evolving world of data analytics and business intelligence, organizations are increasingly turning to integrated platforms that streamline their processes. Microsoft Fabric stands out as a unified analytics solution that combines the capabilities of Power BI, Azure Synapse, and Azure Data Factory into one cohesive environment.   […]

Social media & sharing icons powered by UltimatelySocial