Do you know what percentage of your company’s revenue is impacted by poor quality data? According to the 2023 State of Data Quality Survey, for the majority of organizations, it’s at least 25%. This is hardly surprising, given the role that high quality data plays in making the right business decisions. That said, data integrity is also important.
What are data quality and data integrity and how do they contribute to an organization’s high performance? Let’s dive in.
Data quality vs data integrity – are they the same?
While data integrity and data quality bear certain resemblances (and tend to be used interchangeably), they’re not synonymous.
What is data quality?
Data quality revolves mainly around data’s fitness or readiness for use – its excellence, relevance, completion, and consistency. The more insightful the data, the higher its decision-making power. In order to make sure that the data you use is of high quality, it needs to go through a number of processes, including data cleaning, standardization and governance.
Top five criteria for high-quality data
How can you tell if your data is of the highest quality? While each organization might have its own guidelines, here are the five universal characteristics.
- Validity: All of the organization’s data follows strict standards, which are put in place to improve data useability and legibility. They must also be verified to ensure they’re valid.
- Up-to-date: Data cannot be obsolete. For example, any records relating to time-sensitive business disciplines like churn need to come from the same year or even quarter. The goal is to ensure that the information is insightful.
- Consistency: Data follows strictly outlined standards. These particularly relate to the data’s syntax and structure. For example, all dates entered into the organization’s systems must follow the US standard, i.e., MM/DD/YYYY.
- Completeness: To enable comprehensive data analyses, records must also be complete. It cannot lack context and feature informational gaps, as these hinder the data’s decision-making power.
- Uniqueness: Each dataset is free of duplicated, redundant information, and serves a specific purpose.
This might pose the question – if your organization checks, for instance, four out of the five factors above, does it mean that it has good quality data? Not quite. If any of the above criteria aren’t met, they could potentially undermine the trustworthiness of all other data and block your data-driven decision-making abilities.
What is data integrity?
Meanwhile, data integrity is a broader term, with data quality being one of its factors. It’s measured by looking at three key areas – the data’s accuracy, dependability, and consistency. It’s a critical element of data management practices. The goal is to make sure that all data passing through the organization or beyond it is safe from unauthorized access and is free from human mistakes or corruption. With the right data integrity practices in place, organizations can be certain that the data they’re accessing can be trusted when used in the decision-making process.
The four pillars of data integrity
Let’s now take a look at four elements or pillars that are needed to maintain data integrity.
Data integration: Irrespective of where your original data comes from, i.e., a legacy system, cloud data warehouse, or a relational database, must be smoothly combined for easy visibility and use. In short, all data sources must be integrated with one another.
Data quality: If you want to use data in your decision making process, you have to first make sure it’s unique, complete, valid, and consistent. Poor quality data will translate into bad business decisions.
Location intelligence: You can make your data more useful by enriching it with additional information related to the physical location of the data source. This would, for example, include an address.
Data enrichment: By adding information from external data sources to your internal data you can make it more valuable. It will improve the depth of your analysis, as it will give you a more detailed overview of a problem or a situation.
What does data quality and data integrity look like in practice?
Imagine that you’re a healthcare clinic that asks its patients to provide information such as their:
- Name and surname
- Date of birth (D.O.B.)
- Medical record history, like their previous surgical procedures and chronic illnesses like allergies
- Credit card details
- Social security number (SSN).
All of the above is personal data, and some even have highly sensitive data status.
There are a few goals you need to reach here as the party that stores and uses the data – keeping it up to date, consistent across all databases, and secure. Among others, you need to create data validation rules (for example, storing SSNs with or without hyphens, blocking appointment dates from being scheduled in the past, etc.). You must also define the right data clearance levels, i.e., who has access to which types of records to minimize the risk of mistakes or data breach.
When it comes to data quality, one of the best examples to illustrate it are your customers’ physical addresses and contact details. We might have a Mr. Dawson who lives in Granada, Spain and a Mr. Dawson who lives in Grenada, the country, in your records. Are they the same person, or are they two separate customers? This matters not only for getting the delivery address right but also for any customer analyses, as you must know if these are two individuals or just one.
You have to use a data quality tool to make sure that there are no inconsistencies in your dataset. If you don’t, your data will have information gaps or erroneous information, leading to low decision-making capabilities.
Why is integrity of data important?
Reaching and safeguarding data integrity across your business can result in a variety of benefits. Among others, you can boost your productivity and profitability by saving your team the time and effort spent on searching for data. It also minimizes the risk of making business decisions on obsolete, corrupted, or incorrect information. Let’s take customer surveys for one – if one of your staff members makes a mistake in gathering answers, it could undermine the entire insightfulness of the survey campaign. In the best-case scenario, they might not be able to use the data. In the worst case, data could be interpreted incorrectly and used across the business.
In essence, you can’t be sure about how good you are at data-informed decisions if you can’t guarantee your data’s integrity.
This brings us to another important data integrity benefit – protecting your company from data breaches and, eventually, reputation damage. Bear in mind that any organization collecting information from clients or employees processes so-called Personally Identifiable Information (PII). These cover a variety of data, from basics like name and surname, all the way through to sensitive information like one’s credit card number or social security ID.
If one of your team members falls victim to a malware attack or even makes an accidental mistake in your records, this could wreak havoc on your organization. Any data leaks could result in hefty penalties and destroy the company’s image. For instance, under the EU’s General Data Protection Regulation (GDPR), any “severe violations” could result in a fine of 4% of your company’s revenue or 20 million euros in fine, whichever is higher.
How can you ensure data quality and integrity?
What steps do you need to take to ensure that both your primary data and secondary data are of high quality? We will discuss it now.
Engage in data profiling
Firstly, you can turn to data profiling or data quality assessment to check the current state of your data. It will allow you to reveal errors, inaccuracies, missing data, duplicates, etc, all of which have a negative impact on your data quality. There are many tools that you can use for data profiling, including Atlan and OpenRefine. After spotting issues, you can take the right measures to fix them.
Run data cleansing
Data cleansing comes after data profiling, helping you clear your records of any erroneous, inconsistent, or duplicated data. In the case of the latter, you’re able to spot and remove any data that could have been accidentally entered into numerous databases.
Communicate the importance of data integrity
Everyone from your organization can benefit from good data. This makes maintaining data integrity a joint responsibility. However, this is something that you need to communicate to your team – educate people about the necessity of protecting data, keeping it clean, complete, and relevant. You should also show them how to recognize and prevent potential threats.
Document and standardize your data
Another necessary step to maintaining data quality and integrity is through documenting and standardizing your data. The former is about keeping track of all your data sources, collection methods, processes, and any changes to your data.
The latter includes using the same rules and data formats, i.e., naming conventions, categories, units, etc., for example, using ‘error’ instead of ‘mistake’ consistently. If you keep your data standardized and well-documented, it will be easier for you to make sure it’s readable, consistent, and usable. This will translate into higher efficiency by keeping everyone on the same page. As a result, new sets of data will be easier to integrate and whenever an issue comes up, you’ll be able to address it quickly.
Use data encryption
Data encryption is one of the best ways to prevent data from being deciphered by an unauthorized party. In essence, even if a hacker manages to access your files or “hijacks” your server, they will be unable to access the data they store without a decryption key.
Data accuracy results from your data’s quality and integrity
Data quality and integrity are closely correlated; you can’t deem a company’s data mature without them; they are the necessary elements in every organization’s data journey. No business is born with good quality data; to keep it at the highest level, you have to make sure that it’s regularly cleansed, standardized, and properly processed. Ultimately, inconsistent datasets lead to bad decisions.
How your team uses data is also of importance, as data integrity is a joint responsibility. Not only will your team know where they can find data, but also what purposes they can use it for and how they can handle it to ensure data governance policy compliance.