Tuesday, March 10, 2015

Dirty Data

Here's the challenge.  You introduce a data warehouse to the IT landscape of your company.  Then there's the aha moment.   Our data is not clean.   Do you reject the data, keep the data because it reflects the source system, or a hybrid.   The answer is "It depends."   Let me explain.   We had an issue with one of my clients where the source system sent the data warehouse dirty data.  No big surprise there but the data warehouse then delivered data to a vendor that expected clean data.   How do you handle?   This particular client was in the education sector so the Vendor system was showing incomplete student schedules that were in the source system.  The Data Warehouse must be a reflection of the source system.   So you have to notify the source system that the dirty data exists.   We sent back numerous files of dirty data to the source system team.   They corrected the data and started our cycle.  Every day was ground hog day.  

One of the fixes that we put into place was a program to identify dirty data from kids taking assessments.  You would be surprised how many kids can have similar names.  After our assessments returned from a vendor, we loaded them and checked for duplicates.   Then we notified the data stewards in that particular school and had them match the correct students to the correct assessment.  We called that our Power Match system.   

Another fix that we implemented was more on the evangelism side.   We had to notify the business that your job will impact the systems your schools use.   If you input incorrect data on the front end, you will experience dirty data on the back end.  Evangelizing that concept to the business was key.  Not only do you need to tell the end users, you need to preach this to the end users organization structure.    My client implemented a data score card that  let the district know the amount of errors in each school.   They pushed for accountability.  You must do the same thing with your business partners.  You must have buy in from all levels of the organization in order to be successful.

Regardless if you are in the education sector or not, you should know that it is imperative to maintain a good relationship with your business in order to identify and correct any issues that are causing consistent dirty data.   As long as there are source systems that allow the problems, our teams will have to correct these issues at some point in our organizations or systems.  

No comments:

Post a Comment