What Is the Difference Between Data Integrity and Security?

This post discusses what data integrity is, how to use it, and highlights how data integrity is different from data security. Then the post shares a common myth about encryption and answers the question: Can you use encryption for data integrity?

This post was originally published on DataOps Zone

In this post, I want you to join me on a mission at a fictitious  company called Hackme Corporation. Your mission, should you choose to  accept it, is to send Hackme’s year-end financial reports to third-party  authorities while making sure no one can change the documents along the  way. I can assure you that this blog post won’t self-destruct in 10  seconds. However, it will discuss what data integrity is and how to use  it, and highlight the key differences between data integrity and data  security. Then I’ll share a common myth about encryption and answer the  question, can you use encryption for data integrity?

Let’s get started.

Your Mission Needs a Good Plan

How would you approach Hackme’s mission from a data security point of  view? Every mission needs a good plan, and yours starts with the CIA  principles. Not the Central Intelligence Agency, but rather the CIA  principles of data security: confidentiality, integrity, and  availability. Let’s go through them next.

The C and the A: Confidentiality and Availability Principles

The data confidentiality principle is all about keeping data private  and secret. These days, data is the new digital currency, and nefarious  hackers, state-sponsored actors, disgruntled employees, and occasional  or recreational hackers are keen to get their hands on it. These could  include the organization’s intellectual property data, personal data,  health data, payment data—and the list goes on. Getting the  confidentiality principle right is not only a business priority but also  mandated by laws and regulations. Data encryption, authentication, and  authorization technologies will help you make sure you use them wisely.

Remember, in our mission we need to send financial reports, and these  records are available to the public in our case. This means we don’t  worry about confidentiality that much. Let’s now turn our attention to  availability.

The data availability principle means that data is always available  and accessible to the organization. The principle also has a few  threats, like data center outages, hardware failures, DDOS attacks,  and crypto-malware attacks. Your team needs battle-tested DR plans,  data backup and restore procedures, redundant hardware, and iron-clad  contracts with service providers to be successful with data  availability.

Data availability also isn’t critical in our mission because we need  to send the reports only. Let’s now turn our attention to the most  critical principle for our mission—the data integrity principle.

The I: Data Integrity Principle

The data integrity principle focuses on the validity, accuracy, and  consistency of the data. It’s a set of rules and mechanisms to record  and receive data accurately over its whole life cycle. Data integrity is  like when you send a parcel of fragile wine glasses to your grandma. To  make sure grandma gets wine glasses and not broken glass, you wrap the  glasses with paper or some other wrapping material. You can think of the  wrapping material as the data integrity principle.

Sounds simple, doesn’t it? Well, not so fast—let’s explore data integrity a bit further.

Let me start by clarifying one thing first. Data “accuracy” in the  context of data integrity is not accuracy in the traditional sense. Let  me explain this with our financial report as an example. Data integrity  does not focus on the accuracy of the report. In other words, when the  income statement isn’t accurate and it doesn’t represent the financial  truth of the organization, that’s a data quality issue. However, data  integrity has to preserve data quality during the data life cycle. Does that make sense?

To preserve data quality and accuracy, we need to talk about physical  and logical integrity. When we store and retrieve data from any digital  storage, we need physical integrity. This is all about error-detection algorithms, checksums, and various mechanisms working in the background transparently.

Logical integrity defines logical rules, constraints, and  structures for your data. Why do we need logical integrity rules in the  first place? Without them, we couldn’t make digital models or define  complex relationships between things and data structures. For that  reason, we have logical integrity, entity integrity, referential integrity, domain integrity, and user-defined integrity rules. Let’s discuss these with a classic bank account example.

Logical Integrity Deep Dive

Entity integrity means that each entity is identifiable with  a unique key. In other words, you’re a bank customer or an entity. The  bank has to identify you in its system with a unique key so it won’t  mistake you for someone else.

Referential integrity is another form of logical integrity.  It ensures that the relationships between entities are clearly defined.  In our banking example, both you and your account are uniquely  identified, but you also belong together. Referential integrity defines  which bank account belongs to you exclusively, and it mandates that you  have an account with an account balance—hopefully with lots of zeroes in  it!

Domain integrity encompasses constraints and rules that  define properties for logical entities. In other words, you can’t open a  bank account without your name, your address, and so on.

User-defined integrity rules are additional constraints, limits, and rules defined on the basis of business requirements.

How Is Data Integrity Different From Data Security?

Before I answer this question, let me clarify one thing first. People  usually mean data confidentiality when they talk about data security.  Both confidentiality and integrity play a key part in data security.  They look similar, but they have different purposes. When you apply the  data confidentiality principle, you want to keep the report’s contents  secret and confidential. When you apply the integrity principle, you  don’t want anyone to modify the report without your knowledge. To  understand how these two principles differ, let’s take a look at two  technologies used to support each principle: data hashing for integrity  and encryption for confidentiality.

Get Cracking With Hashing

Hashing algorithms are one of the most fundamental tools in the data integrity toolset.  They’re a set of mathematical functions (e.g., MD5, SHA-1, SHA-2,  BLAKE2) you can apply and generate hash values or hash digests of the  data. Just think of these hash digests as digital fingerprints of that  data. The nature of the hash algorithm is that even the slightest change  in the data will produce a completely different fingerprint. How could  this help with your mission?

You could use hashing and generate the hash digest of the financial  report. After that, all you need to do is send both the report and the  digest to the third party. The third party would repeat the same process  that you did. They’d generate their hash digest of the report and  compare theirs with yours. If the digests match, they have proof that no  one modified the report and the integrity is intact.

However, when we use hashing algorithms alone, our mission could be  in jeopardy. An attacker could still modify the report, generate their  own hash, and send those to the third party. The third party wouldn’t  know which one is trustworthy because now they’d have two documents and  two hashes. An attacker could also modify your hash and invalidate your  report at the third party. It looks like our mission is difficult but  not impossible; let’s move on to data encryption and see how it could  help us.

Could Encryption Save the Day?

Data encryption is one of the most important tools in your data  confidentiality toolkit. This topic is complex, challenging, and not for  the faint of heart. Books and encyclopedias go into great detail of  various encryption algorithms, technologies, and methods, but for now,  let’s keep things simple. First, you generate a key and encrypt the  document. Then, you send the encrypted document to the third party. The  third party decrypts the document with a decryption key and reads the  document. Does this mean we managed to maintain Hackme’s report  integrity? If you think data encryption is the answer, please read on.

Can Data Encryption Guarantee Data Integrity?

Previously, data encryption looked like a great solution, but  unfortunately, you can’t rely only on encryption for data integrity.  Why? Because an attacker could still modify the encrypted document.  Remember, in the digital world everything is zeroes and ones, and the  same is true for our encrypted report. The attacker could still inject  or overwrite the zeroes and ones in the document in certain cases. This  means the third party could still decrypt the modified document. In that  case, the third party wouldn’t know that the document was modified by  an attacker.

Using encryption alone could help you with confidentiality, but you  can’t rely on it for data integrity. That’s why most modern  cryptographic solutions use a combination of hashing and encryption. The  same applies to our mission, which means we have to use both encryption  and hashing to be successful in our mission.

Data Integrity Mission Prologue

In summary, we discussed the CIA principles, with a focus on the data  integrity principle. Now you understand what the key differences are  between data integrity and data confidentiality. I’ve also busted a  common myth and answered the question, “Can data encryption guarantee  data integrity?” Your key takeaway is that you can’t rely on encryption  or hashing algorithms alone for data integrity, and to be successful in  our mission, we’d have to use both to send the financial report.

Congratulations! You completed this mission with flying colors. In  closing, I’ll say that our mission could still be improved with digital  signatures and certificates—but that’s a story for another time. Until  then, keep safe and secure.