Content & hashing

How Label 309 binds a record to its content — the hashes map, what the digest commits to, and Merkle commitments for anchoring many items under one root.

The content hash is the claim. Everything a Label 309 record asserts about existence flows from a cryptographic digest of the content bytes, anchored on chain under metadata label 309. This page defines how that digest is carried, exactly what it commits to, and how a single 32-byte root can stand in for an arbitrarily large batch of items.

The hashes map

Each item in a record carries a hashes map: a CBOR map from an algorithm identifier to a raw 32-byte digest.

CBOR
hashes = {
  "sha2-256": h'…32 bytes…',      ; key = algorithm id, value = raw digest
}

Keys are text-string identifiers drawn from the hash registry; values are raw byte strings, never hex-encoded. The map must contain at least one entry, and every registered hash algorithm produces exactly 32 bytes:

IdentifierAlgorithmReferenceDigest
sha2-256SHA-256FIPS 180-432 B
blake2b-256BLAKE2b-256RFC 769332 B

Both identifiers are mandatory for verifiers to implement, so a single-hash record under either one validates everywhere. A verifier that encounters an unrecognized identifier rejects the record with a stable error code rather than silently skipping the entry. The full registry, including reserved post-quantum slots, lives on Algorithm registries.

Using a CBOR map — rather than parallel arrays or a list of {alg, digest} sub-objects — has three consequences that are part of the wire contract. Duplicate algorithms are impossible by construction, because CBOR map keys are unique. Canonical ordering is automatic, because canonical CBOR sorts keys by their encoded bytes, so two producers expressing the same hash set emit byte-identical maps and any record-level signature over them is stable. And the shape needs no per-entry validation: a structural validator only checks that each key is registered and each value has the algorithm's digest length.

What the hash commits to

The digest commits to the content bytes — the exact byte sequence the producer is timestamping. Every entry in a hashes map must be the digest of that same byte sequence under its named algorithm; a record whose entries describe different plaintexts is non-conformant. When the content bytes are available to a verifier, it must recompute every digest and reject the record if any one fails to match.

When a record carries an encryption envelope (enc), the hash binds the plaintext, never the ciphertext. This is deliberate: a Proof of Existence exists so an author can later reveal plaintext and prove it existed at a given time. Hashing the ciphertext would prove only that some encrypted blob existed, which says nothing about the underlying content. So a sealed record still proves exactly which plaintext was timestamped — the recipient decrypts, recomputes the plaintext digests, and matches them against the on-chain commitment. An enc-bearing item therefore must carry at least one content-hash entry; without one there would be no plaintext claim to recompute against.

Plaintext binding, even when sealed

The on-chain digest of a sealed record is the digest of the cleartext. A public verifier can confirm the ciphertext bytes are the ones the producer published; a recipient decrypts and recomputes the plaintext hash to close the loop back to the on-chain claim.

One hash, or several

A single content hash is fully conformant. For all 256-bit hashes in the registry, the best known second-preimage attacks sit at or near 2^256 classically — a single sound 256-bit hash already covers the realistic threat model over a record's archival lifetime, and structural validators emit no warning for single-entry records.

A producer may add a second entry from an independent design family as optional defense-in-depth — pairing sha2-256 (SHA-2: Merkle–Damgård) with blake2b-256 (BLAKE2: a HAIFA construction over a ChaCha-derived permutation). Because the two families share no structural lineage, a record carrying both is weakened only if both families fall to cryptanalysis at once. The cost is one extra 32-byte digest plus its short identifier per item; the choice is the producer's, and it is never required.

Merkle batch commitments

A single content hash anchors a single content. To anchor an arbitrarily large collection — a 500-file CI artifact set, a stream of IoT events, an audit-log batch — Label 309 defines a top-level merkle[] array. Each entry commits to an ordered list of 32-byte leaves with one 32-byte root published on chain; the ordered leaves themselves live off chain.

CBOR
merkle = [
  {
    "alg":        "rfc9162-sha256",
    "root":       h'…32 bytes…',   ; canonical root over the ordered leaves
    "leaf_count": 4,               ; binds the on-chain root to the leaf-list size
    "uris":       [ … ],           ; OPTIONAL — where the off-chain leaves list lives
  },
]

The registered commitment algorithm is rfc9162-sha256: the Merkle Tree Hash of RFC 9162 §2.1.1, with SHA-256 as the underlying hash. It is a list-commitment construction, distinct from the content-hash registry — a Merkle root commits to a leaf-list structure, a sha2-256 digest commits to plaintext bytes — and so it sits in its own array rather than inside hashes. The on-chain leaf_count binds the root to the size of the off-chain list, foreclosing a substitution that rebuilds a different-sized tree sharing the root for some leaf position.

Tree construction

The construction distinguishes leaves from internal nodes with a one-byte domain-separation prefix — 0x00 for leaves, 0x01 for internal nodes — so an attacker cannot craft an internal node that collides with a leaf. For an ordered list L = (d_0, …, d_{n-1}) of 32-byte values with n ≥ 1, the Merkle Tree Hash is defined recursively:

MTH(L) = SHA-256(0x00 || d_0)                            when n == 1
MTH(L) = SHA-256(0x01 || MTH(L[0:k]) || MTH(L[k:n]))     when n > 1
         where k is the largest power of 2 strictly less than n

A crucial consequence: a single leaf hashes as SHA-256(0x00 || d_0), not as the bare leaf. The root of a one-leaf tree is therefore never equal to the leaf itself. Producers who want to timestamp a single content must use a plain sha2-256 or blake2b-256 entry directly, not a one-leaf Merkle tree. An empty tree (n == 0) is forbidden.

The construction is order-sensitive — permuting the leaves yields a different root — so producers must treat the leaf list as an ordered sequence and preserve that order across publication, archival, and any later proof generation.

The off-chain leaves list

The root is useless without the leaf list, so producers persist the ordered leaves off chain. The canonical artifact is a cardano-poe-merkle-leaves-v1 document, encoded as canonical CBOR (RFC 8949): a 32-byte root, the ordered array of 32-byte leaves, and the leaf count.

CDDL
leaves-list = {
  "format":     "cardano-poe-merkle-leaves-v1",
  "tree_alg":   "rfc9162-sha256",
  "root":       bytes .size 32,        ; raw 32 bytes, not hex
  "leaves":     [ + bytes .size 32 ],  ; ordered raw 32-byte leaves
  "leaf_count": uint,
}

A verifier resolves the off-chain list, recomputes the root from its leaves using the construction above, and matches it against the on-chain merkle[i].root byte-for-byte; the leaf_count in the file must equal both the on-chain leaf_count and len(leaves). A human-readable JSON projection of the same data exists for inspection, but the CBOR form is the normative one — it is what producers emit and what verifiers recompute the root against.

Inclusion proofs

The point of batching is selective disclosure: prove that one item was in the committed list without republishing — or even revealing — the rest. An inclusion proof for a leaf is the ordered list of sibling node hashes along the path from that leaf to the root: an O(log n) sibling path. A verifier folds the leaf and the siblings back up the tree per RFC 9162 and accepts the proof if, and only if, the reconstructed root equals the published root byte-for-byte.

Because RFC 9162 trees are not padded to a power of two, a right-edge leaf in an unbalanced tree can have a shorter path than a leaf on the full side. The authoritative check is therefore algorithmic — does the fold reproduce the root — never a comparison of proof length.

Why batching matters

One transaction and one 32-byte root can stand in for thousands or millions of leaves. Anyone holding an O(log n) proof can later show "this item was in my list," while every undisclosed leaf stays private — the root reveals nothing about the leaves it commits to.