Sector wise readiness
AI and ML products: can you prove a lawful basis for every record?
A misconception I keep seeing when AI startups read DPDP checklists: "the training data is public, so we are fine."
It is the most expensive assumption in AI right now, and shakier than founders think.
Under DPDP, "publicly available" is narrower than people assume. The carve-out reaches only data the person made public themselves, or made public under a legal duty. Scraped, bought, and web data used to train a model will often fall outside it. Public to see is not the same as exempt to train on.
What applies from the start, whatever your size
- ✅ A lawful basis for every use, including training
- ✅ Notice and an appropriate basis, consent where required. Consent is not the only route.
- ✅ Security on training and inference data
- ✅ Rights handling, including how erasure affects training datasets and downstream systems
- ✅ Grievance mechanism
- ✅ Processor contracts for any vendor touching the data
- ✅ Breach response
Three things AI builders get wrong
"Publicly available" is narrower than "reachable." Web scraped personal data will often fall outside the Section 3 carve-out. And if you determine the purpose and means of using it for training, you are likely a Data Fiduciary for that processing. Reuse alone does not convert you. Deciding the why and how does.
Consent provenance has to be provable, dataset by dataset. If you acquire or enrich data, you must show the consent chain behind it. "We bought it from a vendor" is not a basis, and that missing chain is the core exposure auditors look for.
Profiling still needs its own lawful basis. DPDP does not create a separate profiling regime. It is just another processing activity. And behavioural targeting of children is absolutely prohibited, no consent cures it.
What may not apply to you yet
- ❌ SDF obligations. Large model operators are plausible candidates to assess, but status comes only on Government notification, not model size.
- ❌ Algorithmic risk due diligence, mandatory DPO, localisation. These attach to SDF status once notified.
- ❌ Annual DPIAs, independent audits as standalone duties.
The better question
The better question is not "does AI have a special exemption?"
It is "can I prove a lawful basis for every record in my training set?"
Law creates obligations. Scale and risk influence implementation. But the lawful basis question lands on day one. The model's size does not change it.
Which AI data assumption do you think will age worst? Drop it in the comments.