Follow MINDSPIRE on social!
How to approach an end-to-end data migration process?
In the world of exponentially growing data, system migration has almost become a daily routine for companies, and banking business is no exception either.
Fast and proven data migration methodologies and tools are inevitable for success.
A proper validation framework is also essential, to make the whole process both transparent and audit-proof, and – although it might be not obvious at first glance -, to improve its effectiveness.
Everybody knows how to migrate. It is just ETL, right? You have the data in the source systems, you need to extract it, do some mapping on an interim database, transform the data and upload the result in the target systems, and you are done.
Well, this is the basic approach for sure, but if you consider yourself a serious data migration professional, definitely there is some more you want to take care of when implementing the end-to-end migration process, no matter if it is a manual or automatic one.
The most common requirements are correctness and completeness. Two key success factors that seem quite obvious to achieve, to be able to meet them and design the proper validation framework, first we have to understand their exact meaning and the actual requirements.
Head of Migration Services
Árpád has 15+ years of experience in the Financial Sector, being one of the most experienced banking migration experts in the region, having led and participated in 10+ successful Core Banking Migration projects so far.
As and expert and QA of major MINDSPIRE projects, he oversees the full cycle of Migration, including conceptual planning, mapping of source and target systems, overseeing transformation, dress rehearsal and go-live activities.
In software technology and programming the general meaning of correctness is that all the results are compatible with the specification, while completeness ensures that the results required by the specification are produced. In other words, correctness means that the relations defined by the program are subsets of the specified ones, and completeness means inclusion in the opposite direction.1 This may sound quite academic, however, provides the precise definition base. In banking system migrations though, both correctness and even more completeness need to be defined on separate levels, not only on technical, but on business and accounting levels as well.
Apart from the main scope of validation – whether it is about correctness, completeness or both – there are other important factors to consider in validation design and applying the right solution.
Validations can differ in their integration with the technical process: built-in validation or a post process validation with and end-to-end approach are very different in their objectives, toolsets and mostly their execution responsibilities.
Another factor is the comprehensiveness, whether the full set of data or just samples are subject to validation. Considering all these aspects, two main types of migration validation can be distinguished: verification and reconciliation. And both of them might improve the overall success rate as well as the effective resource utilisation in error handling and change management processes, if you are using them properly.
Migration validation framework
Verification – the way to intercept errors
No matter what your migration approach is and what process you build to transfer data from source to target, there will be a series of different data manipulation steps in the data flow: restructuring, exclusions, transformation, converting. Data is processed in each individual step, it is received from the previous one, potentially modified and then passed over to the subsequent one. During this flow there is the risk in each step to make an error that will finally lead to a broken consistency or integrity of the whole data set. Bad data quality, inaccurate specification, wrong implementation, or the combination of these can be the cause.
One of the key principles to design an effective and supportive validation process is trying to detect the migration errors as early as possible in the data-flow, and exclude corrupted data from further processing.
This will help you to identify the root-cause of the error easier and save valuable testing resources, allowing subsequent validations to focus on data considered to be correct and complete by all previous processing. Step-by-step verifications built into the process are the most effective way to trace the data here, and validate technical and business correctness in each processing step consequently.
The goal of verification is to recognize any deviation from the expected values, which might be technical (format, length, type, missing data etc.) or contextual (extreme value, unmatching LoV, inconsistent or missing value, etc.) discrepancies. A deviation can be the consequence of bad data quality originating in the source systems, or failure(s) in the previous process steps, causing data error or data loss. Separate verifications are to be designed for correctness validation in an individual process step, and for completeness validation between two subsequent process steps. Whatever the technical implementation is, the results of these validation points have to be reported and summarised in a full-view running statistic. All the above is something very similar to what BI guys call data linage, with the difference that for migration not the provenance, but the flow of data is in focus, and the detailed – record level – status of the data processing.
Purpose-built migration tools usually have an out-of-the-box function to build these verification reports, and provide advanced features to verify correctness and/or completeness of extract, transformation or loading migration data.
Nevertheless, verifications built into the process are not able to cope with external factors, that are outside the technical process, but still impacting its result. Unfortunately, a couple of these external factors exist, and they can easily ruin a seemingly successful migration. No errors detected by the step-by-step verification? Hundred percent of the data is successfully uploaded to the targe system? You still cannot be sure… Just refer back to our initial definition on correctness and completeness: “(…) results are compatible with the specification”. And what if the specification itself is wrong?
Discover our ready-to-use, powerful migration application with parameterisable data transformation, process management and built-in reporting!
A specific, but even more necessary method of validation is the left-to-right or screen-to-screen verification. The aim of this is to compare initial data as it is in the source system with data available in the target system, and identify any discrepancy by analysing a preselected set of sample entities. As this comparison is independent from the technical process, and should not be based on the technical specification but rather on business test cases, screen-to-screen verification is an effective way to find hidden errors undetectable by other audits, and should not be excluded from your migration validation strategy. However, as this kind of comparison cannot cover the full set of the migrated data, one more step is required to ensure the intended level of correctness and completeness.
Migration validation process
Reconciliation – your last line of defence
Logically, the final step of validation is a post-process, end-to-end comparison of migrated data, and the detection of any residue of incorrectness or incompleteness. Analogical to screen-to-screen verification, reconciliation should be completely independently designed and built from the migration technical process, and in order to avoid replicating mistakes in the implementation, it should not depend on any direct input from it. Rather, it should compare the two end states: source data as it was at the time of starting the migration with data in the final, actual data structures of the target system (so avoid using intermediate upload tables or data stages). There is one major difference: reconciliation always covers a full set of migrated data and provides all errors on the analysed record set.
Comparing all the data end-to-end, well, this could be the magic tool, the validation above all validations…, except that it would require enormous capacity in time and resources to build an exhaustive reconciliation including all business attributes and data fields. Instead, focusing only on the most critical data is the key. Criticality is to be validated mainly from business perspective, but – talking about migration of financial data – accounting reconciliation is a must have validation as well.
Although reconciliation is ultimately a tool for migration acceptance in the go-live procedure, if you set up and run such comparison in time and on all relevant critical data, error tracking and bug fixing during the migration implementation phase will highly benefit from it as well.
It will improve the chance of error detection during the early phases of testing, and that is why you should consider reconciliation not only as a post process report and evidence for go/no-go decisions, but use it actively as an efficiency improvement measurement in migration implementation.
But it has one major risk: designing and setting up a proper reconciliation process is not simple; it is comparable with the complexity of the whole migration process. An imperfect reconciliation demands massive efforts for nothing, as it will nullify any potential resource savings.
Ok, and then how to design the proper reconciliation process?… I will cover this in an upcoming post.
1 Correctness and completeness of logic programs Version of 24-04-2015 Wlodzimierz Drabent, Institute of Computer Science, Polish Academy of Sciences, and IDA, Linkoping University, Sweden
Please see MINDSPIRE’s related: