If data is the oil, then ai-generated data is toxic sludge - contaminating the very wells we drink from. Atleast from the perspective of consumption of this data in BFSI.
In 2018, Nandan Nilekani prophetically outlined banking's paradigm shift from "hard" to "soft" collateral. i.e. from hard collateral (land, buildings, machinery), he outlined the underwriting basis soft-collateral (i.e. data). Which made eminent sense since data generation was doubling every 2 years. For example, there is about 13-20GB of data that could be available on a micro-entrepreneur (banking + social + legal + trade + payments + etc) on a per file basis.
This started becoming a bigger deal when authentic data started rolling in from source systems (GSTN, banking via account aggregators, Trade data from e-comm platforms, etc). On top of that there were varying indicators related to own/counterparty checks from social and legal sources. By 2022, the phrase ‘Data-collateral’ had formally made its way to the BFSI ecosystem’s lexicon.
The ecosystem responded brilliantly with multiple initiatives, the most visible amongst them were the OCEN and GST-Sahay. Then came in newer underwriting models, most famously the revenue-based-finance underwriting models. The underlying alchemy was elegant: transmute digital footprints into credit decisions.
But today, we face an existential crisis. ai-generated data isn't merely muddying the waters - it's poisoning the well. Search engines are getting clogged with ai-images. When Pearl search engine's (https://www.pearl.com/) unique selling proposition becomes "human-verified results," we know we've hit an inflection point. If GSTN data can be manipulated without AI, imagine the possibilities with it. Our data lakes are becoming data swamps. ID data is increasingly open to manipulation and outright-fraud.
I believe we need a 3 pronged strategy to prepare to defend the system and process integrity.
The Three-Pillar Defense
Traditional risk frameworks are seeing the strain - from ID fraud, to document fraud to now ai-driven fraud. We need a new architecture:
- Source-System Authentication: Building fortified data corridors from verified origins
- Social Signal Recalibration: Developing immunity to synthetic social proof
- Transaction Intelligence: Creating ai-resistant verification layers for payment flows
Tokenization: beyond the buzzword
An additional layer of security could come from tokenizing data-collateral. This involves intermediaries certifying the authenticity of data and issuing tokens to represent it. Examples include:
- Attestation tokens (example - ERC-721 Tokens): For fixed data-collateral, such as a single snapshot of verified information or hard assets.
- Dynamic Data Bonds ( example - ERC-4626 Tokens): For evolving data sets that grow or change over time.
Tokenization would enable lenders to trust the data’s veracity in the ai era, with intermediaries (or DAOs) acting as certifying agents.
This solution could elegantly dovetail with the solution set of the 3-pronged defence proposed above. As an aside, blockchains, finally could find the killer-app they have been looking for - thereby opening a world of web3 fintechs to grow and mature. And fintechs would have a material opportunity to become the trust-layer in the data-driven-economy.
This challenge presents a significant opportunity. The global market for data authentication and verification is projected to reach $200 billion by 2026 (Gartner, 2023), with India alone expected to need 100+ specialized data verification providers.
The 2025 Imperative
The window for voluntary industry action is closing fast. Every ai-polluted dataset increases systemic risk exponentially. Regulatory intervention isn't a question of if, but when. The choice is clear: self-regulate now or face potentially stringent oversight later.
The future of finance hinges on our ability to distinguish signal from synthetic noise. And hence, we must act decisively. The alternative is watching our data-driven lending infrastructure collapse under the weight of artificial authenticity.