Wash, Rinse, Repeat to Keep Your Data Clean and Manageable

Today’s post in our series on data focuses on the importance of having the tools and processes in place for continually identifying and correcting any gaps or flaws so that your data is always accurate. At this point in our series, you know what data you have, where it comes from and where it lives. You can also easily figure out what you don’t have but do need, which means you know which data needs to be corrected and which gaps filled in. Since your data is always changing (new data is entered, existing data is updated), no one’s data is ever perfect at all times. Data is like a river; it’s always flowing. Just because it was all correct yesterday doesn’t mean it will be correct tomorrow.
Compounding this data fluidity is the environment of the fund office: for many data elements, data collection and data entry often end up being manual processes, especially for member information such as birthdates, marital status and life events. And by definition, even when people are being careful, manually entered data is likely to have an error rate of 1 -3 %. While some systems are quite rigorous about validating data before it is entered, others are much less so. It's often a balancing act between imposing restrictions and controls on data entry to optimize inbound data quality versus allowing data entry to be fast and easy with few if any validations. This last point is important because onerous validations often drive creative methods for working around the process. A good example of this would be individuals fabricating a marriage date if it is not known in order to get past the validation that requires a date (even if it is unknown) to create the member record. Unfortunately, once that has been done, it can be very difficult to find the “fake” dates within the data, which can lead to unexpected problems down the road.
Our approach is a little bit different and is based on creating a regular and rigorous “exception detection reporting & correction process.” This is a proactive process that should be incorporated into the daily or weekly processes and all but eliminates the challenge that arises in waiting for a problem to happen and then going back to troubleshoot the data. Essentially, the core of this approach is to design and regularly run data exception reports AFTER the data is entered (vs. a VALIDATION process which occurs before or during data entry). An example of such a report would be one that surfaces participants who are married but where the marriage date is missing. Another might surface people who are working but don’t have a date of birth (DOB) or where the DOB is unrealistic (i.e. the individual would be 122 years old).
It's important to remember that even if your data is determined to be 99% good, if you have 1,000 people you still have 10 errors which can be significant when it comes to providing individuals their benefits in a timely and accurate manner. Hence, the process is never finished and is ongoing - you’re always creating errors, surfacing errors and resolving errors. It is a mistake to think that data entry, and therefore data, is always perfect but if you have a way to continually polish it, it will always shine.
10 Step Data Quality Program
- You know where your data comes from in terms of systems and sources
- You are aware of conflicts and inconsistencies between your data sources
- You have an approach for resolving any conflicts between data sources
- You capture data once, and use it in multiple places
- You have documented what data is critical for implementing your business rules, and you have approaches for filling in any missing data
- You have tools and processes for identifying and correcting flaws in your data
- Your data exists in a format that makes it easy to access using readily available tools
- You are not dependent on a software vendor for access to your data
- Everyone on your team is cognizant of the value of “good data” and the long-term costs of "sloppy data”
- You leverage your data to support operations AND to support long term decisions







