Matrix E2EE + Threshold ("Multisig") Cloud-Backup Recovery

Matrix / Olm / Megolm SSSS + Key Backup Shamir · Feldman VSS · DKG t-of-n recovery April 2026
The picture in one paragraph. You don't need to change Matrix wire formats. Matrix already ships encrypted cloud history via Server-Side Key Backup + SSSS. The whole trust root is two 32-byte secrets — the SSSS master key and the Megolm backup Curve25519 seed — and in practice Element wraps the latter inside SSSS, so you're replacing one 32-byte user secret. Swap the single Recovery Key for a t-of-n threshold scheme and you get M-of-N recovery of everything. MVP is Shamir. V2 adds Feldman VSS (custodians can't silently cheat). V3 uses DKG + threshold ElGamal decryption — the seed never exists on any single machine. All three work without touching Matrix, vodozemac, or any MSC.
Hard design decision up front. If the same ceremony serves both "user lost phone" and "regulator served warrant," regulators gain a standing capability, not a case-by-case one. Keys Under Doormats (Abelson et al., 2015) is the canonical warning. Separate your custodian sets and ceremonies. Same primitive, disjoint people, different governance. And be honest in marketing — once a compliance path exists, "E2EE" no longer means what Signal/iMessage-ADP users think.

1. The exact surface you're modifying

Layer A — Server-Side Key Backup

m.megolm_backup.v1.curve25519-aes-sha2. 32-byte Curve25519 private key; public half signed by your cross-signing master. Every Megolm inbound session is wrapped with ECIES: ephemeral Curve25519 → ECDH → HKDF-SHA-256 (→ aes_key/mac_key/iv) → AES-256-CBC + HMAC-SHA-256. Ciphertext uploaded to /_matrix/client/v3/room_keys/keys. Homeserver sees opaque blobs.

Layer B — SSSS

m.secret_storage.v1.aes-hmac-sha2 (MSC1946 + MSC2472). 32-byte master key derived from passphrase (PBKDF2-HMAC-SHA-512, 500k iters) or from a printed Recovery Key (0x8B 0x01 + 32 bytes + parity, Bitcoin-alphabet Base58, ≈48 chars). Each secret wrapped via HKDF(info=secret name) → AES-CTR-256 + HMAC-SHA-256. Stores: Megolm backup seed + 3 cross-signing private keys.

What you replace

The SSSS master key — 32 bytes that unlock everything. Replace the passphrase / Recovery Key with a t-of-n threshold scheme and you get M-of-N recovery of all five secrets for free. No changes to Matrix, Megolm, Olm, or MSCs.

2. "Multisig" — a terminology fix

In crypto-wallet speak "multisig" usually means Bitcoin-style N-of-M signatures on transactions. Not what you want — Matrix backups are encrypted blobs, not transactions. What you actually want is one of:

Use the term "t-of-n threshold recovery" in design docs. It's precise and maps to the literature.

3. Implementation tiers

TierSchemeCustodians seeRust libraryEffort
MVPShamir over GF(2^8) or SLIP-0039Raw seed at reconstructionvsss-rs, sharks~1 week
V2 (recommended V1)Feldman / Pedersen VSSRaw seed, but cheating detectedvsss-rs (same lib)+2–3 weeks
V3DKG + threshold ElGamal on Curve25519Never see the seedfrost-core + custom threshold ElGamal+2–3 months
Ship Feldman VSS over Curve25519 as V1. Same library as plain Shamir, tiny incremental cost, closes the "malicious custodian returns garbage" attack. Keep V3 on the roadmap for when "custodians never see keys" becomes a product promise.

MVP flow — setup

user client homeserver custodian[i] │ │ │ generate 32-byte SSSS key │ │ Shamir-split → shares[1..N] │ │ Feldman commitments g^a_i │ │ │ │ │ for each i: │ │ ECDH(custodian[i].pubkey) │ │ HKDF → AES-GCM encrypt share │ │ │ │ │ POST m.secret_storage.threshold.v1 ──► stored as account_data │ │ │ commitments → transparency log (Sigstore / Matrix room)

MVP flow — recovery

new device custodian[i] transparency log │ │ │ authenticate user │ │ fetch m.secret_storage.threshold.v1 from homeserver fetch commitments from transparency log ◄─────────────────── │ │ │ │ for i in chosen t: │ │ request share_i ────────────►│ decrypt own share_ct │ │ return share_i │ verify g^share_i ?= g^f(i) (Feldman check) │ │ │ │ Lagrange-interpolate → SSSS master key unlock m.megolm_backup.v1 + 3 cross-signing secrets restore cross-signing + backup pubkey GET /_matrix/client/v3/room_keys/keys ──► decrypt → history restored

Storage cost on the homeserver: N encrypted shares + N commitments, ≈ N × 200 bytes. Negligible.

4. Threat model & mitigations

ThreatMitigation
M custodians collude → total compromiseDiversify jurisdictions, legal entities, infra. Signal SVR3 uses 3 different cloud-enclave vendors (SGX/Azure + SEV-SNP/GCP + Nitro/AWS).
Custodian loses their shareSet M < N (e.g. 3-of-5, 5-of-9). Optional time-locked recovery share with a notary.
Malicious dealer at setupFeldman VSS. Commitments on a transparency log.
Eclipse on recovery channelNoise-XK or pinned-TLS to each custodian. User sees custodian fingerprints at both enrollment and recovery.
Replay of old shares after rotationProactive Secret Sharing (Herzberg et al.) — vsss-rs supports it.
Custodian platform coercionJurisdictional diversity. Warrant canaries. Public custodian list + transparency report.
User forgets second-factorTier-2 social-recovery set, separate custodians from primary.

Custodian selection patterns

Social recovery (Argent-style)

User picks 5 contacts, 3 required. Pure consumer. Works for user disaster recovery; useless for regulatory access.

Institutional custodians

3 legal entities in different jurisdictions, 2 required. Professional SLAs. UX cost of custodian setup.

HSM fleet (Signal SVR3 style)

3 hardware-attested enclaves you operate across SGX / Nitro / Apple SE. Fastest recovery UX. Requires trust in enclave attestation.

Hybrid

User chooses social or institutional at enrollment. Different sets, different ceremonies.

5. Regulatory access vs user recovery — keep them separate

If the same t-of-n ceremony covers both cases, regulators gain a permanent capability — the mechanism exists, custodians are identified, compelling t of them always returns plaintext. This is the single biggest technical-policy mistake you can make.

Better — dual encryption with disjoint custodian sets. Each session_data backup row is encrypted to both P_user_recovery (user ceremony) and P_compliance (regulator ceremony). The two threshold sets are disjoint legal entities with disjoint governance. Storage doubles for backups only (Megolm backups are tiny vs media).

Marketing honesty. Signal / iMessage-ADP / WhatsApp-password-backup have set the semantic bar for "E2EE". Once a standing compliance path exists, you must not market this as E2EE without qualification. Publish a clear explanation of what is accessible to whom, under which conditions. Ledger Recover's 2023–24 backlash is the cautionary tale.

6. UAE practical constraint

You're in Abu Dhabi. The UAE has consistently required lawful access to comms: BlackBerry standoff (2010), persistent OTT-voice blocks, TDRA whitelist-only VoIP, Federal Decree-Law 34/2021 (Cybercrimes & Rumors) and 45/2021 (PDPL) both presuming state access under investigation. A TDRA-licensed UAE messenger will almost certainly be required to ship a compliance path. Design for it explicitly with disjoint custodians and public governance — don't bolt it on after a regulator asks.

7. Comparison — which scheme

PropertyShamir (MVP)Feldman VSSPedersen VSSDKG + Threshold ElGamal
Seed exists during recoveryYesYesYesNever
Detects malicious custodianNoYesYesYes
Hides secret in commitmentInfo-theoreticReveals g^secretInfo-theoreticYes
Custodians online at recoveryNoNoNoYes
Rounds1112–3
Rust library (2026)vsss-rs, sharks, SLIP-0039vsss-rsvsss-rsfrost-core + custom
Proactive refreshAdd-onYesYesYes (DKG re-run)
EffortLowLow+Low+High
Production referenceVault unseal · Trezor ShamirLedger RecoverAcademic + some walletsSignal SVR3 model

8. Build checklist

  1. Matrix client stack: matrix-rust-sdk (Apache-2.0) + vodozemac. Don't build your own — Element X is the UX reference.
  2. Homeserver: Dendrite (Apache) or Conduit. Avoid Synapse AGPL unless you're ok with it.
  3. Threshold recovery crate: vendor vsss-rs, implement m.secret_storage.threshold.v1 account_data. Publish as a draft MSC for community review.
  4. Custodian service spec: Noise-XK API, share upload/download, fingerprint verification UX, audit log to a transparency log.
  5. Enrollment UI: user picks N custodians, sets t, confirms fingerprints, gets a printable "multisig descriptor" sheet.
  6. Recovery UI: new device → contact custodians → verify shares (Feldman) → reconstruct → re-sign cross-signing → restore Megolm backup. Mirror Element's existing Secure Backup flow.
  7. Governance docs: custodian list, fingerprints, ceremony protocol, compliance-path terms. Warrant canary. Transparency report.
  8. Honest product page: explain what E2EE means in your system, where recovery exists, who can invoke it.

Effort: 4–6 engineer-months for a V1 ceremony wrapped into matrix-rust-sdk + custodian service + UI + governance. Doesn't include the rest of the product.