Data Integrity

Data integrity refers to accuracy, availability, completeness, consistency, compliance and security of a given data point. Before mentioning the current problems, we will briefly explain the meaning behind these concepts:
Data Integrity Concept
The wholeness of the data, without missing information or gaps
Measures the correctness of the data in how well it reflects actual states.
The standardization of how data is processed so all the data from the same category is treated the same.
The ability for an end user to access the data.
How data is stored and handled, with respect to best practices and regulatory requirements.
In general, blockchains are fully transparent, public systems that are open to everyone with access to a node. But, data integrity issues persist despite these features. Most of the key problems that need to be addressed are due to the lack of standardization across the industry. Protocols and data analysts measure derived variables and parameters differently depending on the data format. This results in data being processed differently, which|when scaled over large data sets|presents explicit discordances across what should be uniform interpretation.
There is no universal normalization technique that works on every dataset. Deep, careful analysis and pattern tracking are integral to rendering data that’s accurate and actionable. Real-world data includes unwanted, incorrect, and missing data points that have to be handled carefully as they may introduce bias into the results. That’s why most scenarios demand a data engineer/analyst to interpret the data and models into a form that others can easily understand. Even simple derived metrics are questioned. For example, on the LobsterDAO(7) telegram group of developers and DeFi enthusiasts, the question was posed:
"Does anyone know why the TVL [Total Value Locked] numbers on Defi Llama and Defi Pulse show Aavearound 7.5B TVL while Aave’s own site says they’re above 10B?"
And the response was necessarily informal:
"Leverage. Similar to Compound last summer, you can lend what you borrow which enables leverage farming."
In this exchange, we see providers employing different methodologies, both labelled Aave TVL. Aave includes cascading loans in their TVL, while Defi Llama(9) and Defi Pulse(8) do not. In traditional finance, we have specific, regulated labels for accounting data. Despite uniform raw data, methodological variations necessitate that data be labeled differently. An example is EBITDA and EBIT. EBIT considers a company’s approximate amount of income generated and EBITDA provides a snapshot of a company’s overall cash flow. Both are ultimately important when analyzing a company’s financial performance, but each calculation serves a distinct purpose.
To provide industry confidence, the methodological provenance of derived data must be audited. Black box/privatized data interpretation undermines confidence, which is a counterintuitive data issue for publicly available Blockchains. Credmark solves this by bringing normalization of data and methodological transparency to the ecosystem at large.