Trust & Personal Data Minimization - human proofs + lemma.id

Data footprint

What each party actually holds

We're not going to claim the data vanishes. The honest version: your site avoids becoming a KYC operator, users avoid carrying a reusable global ID, and lemma.id keeps only the linkage it needs for issuance, recovery, and revocation. No raw government documents anywhere in that list.

Your site

~80 chars

Verdict + site-private PPID

A proof verdict and an opaque per-site identifier like did:lemma:ppid_…. With lemma.id continuity it means "same lemma.id"; with a human proof, "same verified human." Either way, no ID document, selfie, legal name, or date of birth ever touches your servers.

lemma.id

Minimal

Internal verification linkage

Encrypted hashes derived from IDV outcomes, plus wallet↔person linkage for revocation. No raw documents, face images, or legal name at rest.

Typical IdP

Full profile

Email, name, login graph

Auth0, Google, and Okta typically store email, profile attributes, session history, and a shared sub across every app on that provider.

Direct KYC

Full IDV record

Documents + biometrics

Running Stripe Identity or Onfido yourself means your servers (or your vendor contract) hold document images, selfies, legal name, DOB, and verification reports.

Side-by-side

How lemma.id compares

Dimension	Your site + lemma.id	Auth0 / Google / Okta	Direct KYC on your stack
What your backend stores	`human: true` + site-private PPID	Email, name, global `sub`, tokens	ID images, selfies, name, DOB, reports
Cross-site user linkability	Sites see different PPIDs that can't be linked to each other	Same `sub` across all apps on that IdP	You become the identity store; correlation is your problem
Return-visit verification	Local Ed25519 check in browser, no lemma.id call	Server token validation every session	Re-verify or re-query vendor per check
IDV document retention	lemma.id: derived hashes only; Didit sessions purged after issuance	No government-ID verification (weaker human signal)	Full artifacts retained per vendor policy
Breach blast radius (your servers)	Opaque IDs, useless as identity documents	Email + profile data exposed	Government ID data exposed
Enforcement durability lemma.id: same identity · human proof: same human	lemma.id continuity for return visits. Human proof: person-root binding, so bans survive email and SIM rotation	Attacker rotates email or creates new OAuth account	Strong if you keep artifacts; expensive to operate
User erasure	Wallet export + `POST /api/ishuman/erase`	Provider-dependent; often incomplete across apps	Vendor + your DB; multi-system coordination

These are engineering comparisons, not legal guarantees. Check them against your own DPIA, DPAs, and threat model before you rely on them in production.

Honest scope

What “personal data minimizer” means here

First, the caveat you'd want a lawyer to point out anyway: under GDPR, pseudonymous data is still personal data if someone holding the keys can re-identify it. We are not claiming to be outside privacy law. The claim is narrower: lemma.id holds the minimum derived identifiers that enforcement requires, and meets its data-controller obligations for that minimum.

What we process once and don't keep: the document number and date of birth from the IDV provider, used only to derive cryptographic anchors. Legal name and selfie images play no part in that derivation and are never stored.

What we keep: HMAC-derived document and person-root hashes (encrypted at rest), wallet-to-person bindings, per-site PPID mappings for revocation, and verification metadata. Per person, that works out to roughly 200 bytes of derived identifiers. It's a ledger of proofs, not a document vault.

What sites never get: cross-site linkage or real-world identity. A site sees only its own PPID, and other Lemma-enabled sites see different PPIDs for the same person. Sites can block their own PPIDs via the site-block API and enforce blocks through their own backend policy layer.

No advertising: we don't use verification outcomes, credentials, wallet data, or PPIDs to advertise to anyone, on lemma.id or anywhere else. Product analytics and fraud prevention are kept apart from marketing. Developer accounts may still get service emails, as covered in the privacy policy.

And no, the control plane doesn't disappear. Issuance, revocation, and recovery all need infrastructure, and we run it. What we can honestly promise is that routine access checks don't call back to an IDV provider, and no credential carries a stable identifier that works across sites.

The name is the model

Why “lemma.id”

In mathematics, a lemma is an intermediate result you prove once and then lean on to prove bigger things. That's exactly the job here. Your lemma.id is the continuous identity object where proofs accumulate: continuity on a site, a presence proof when the holder must be at the device, a human proof when a site needs one person behind the account.

Each proof is a signed credential in your browser wallet. When a site asks, it gets an answer to the specific claim it asked about, bound to its own private PPID. Not your profile, not your history, just that claim. Return visits verify locally, with no callback and no reusable ID crossing site boundaries.

The reason we like the name: proofs compose. Each new claim builds on anchors already established, the way lemmas build toward a theorem, and nobody along the way has to become a KYC operator or assemble a cross-site surveillance graph to make it work.

Three-party trust

What each side gets out of this

For users

You verify once into a wallet you control, like a physical ID. Each site sees only its own private PPID, and lemma.id doesn't watch your day-to-day checks because they happen locally. Your verification data is never used to advertise to you, you can export your wallet, and you can ask us to erase what we hold.

For integrators

You get lemma.id continuity, and human proof where you need it, without building a KYC stack or holding a single document image. If your database leaks, the attacker gets opaque PPIDs. Not passport scans.

For the public web

The web's two defaults so far have been no real accountability and centralized identity surveillance. This is a third option: enforcement backed by verified people, with site-private identifiers and local verification, and no requirement that every website become a regulated identity operator.

Who sees what at runtime

Party	Sees at verification	Stores long-term
IDV issuer (Didit)	Document, selfie, and liveness during the check itself	Purged after lemma.id issues credential (Didit path)
lemma.id	IDV outcome fields for anchor derivation	Derived hashes, PPIDs, revocation linkage
Your site	`{ human, ppid, reason, timeMs }`	PPID on user record (your choice)
User wallet	Full signed credential	Credentials, passkey, wallet secret (encrypted locally)
Other Lemma sites	Different PPID for their domain only	Their own site-private ID, which can't be linked to yours

Enforcement you can explain to legal

Require one verified human per account where it matters, and keep your backend out of the identity-honeypot business. Everything on this page is in the docs, in more detail.

View integration docs See demo Privacy policy

Enforcement-grade assurancewithout identity-grade liability.