I’m trying to understand the serialisation format of an ots file. As I understand it commitment operations are encoded like

1 Byte Tag | variable bytes data length | variable bytes operation payload

For example

F0 10 0e 9b 49 ab 01 34 50 80 35 ce 26 97 f1 ad 09 85

can be read as append (F0) the next sixteen (10) bytes.

As I can see in your code data length (10 in this case) is encoded as a signed integer stored in big endian encoding. When I compare this to the Bitcoin wiki I see that there it is the exact opposite, unsigned integer stored in little endian.

https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer

Of course, both ways are valid, but as your implementation is very close to the Bitcoin protocol I want to understand the reason why you are deviating here.

The same is true for the block height of the Bitcoin Attestation. But in this case, there seems to be no “official” way to encode it.

Feb 06 '19 10:02 etschelp

@petertodd I bet you have good reasons. Could you explain? :)

Feb 22 '19 13:02 domwoe

Sorry! Missed this.

Honestly, I think the scheme I chose was probably a mistake. Easier to just use fixed-width integers, and more likely to have compatibility across implementations. As for exactly why I chose that particular scheme, I guess it seemed efficient and rational?

For future work I'm doing I'm sticking to fixed-width, little-endian, encoding integers wherever possible. The encoding scheme Cap'n Proto has a decent rational for that kind of design, and notable also suggests using a separate simple compression scheme to get both simplicity and compactness: https://capnproto.org/

Feb 23 '19 00:02 petertodd

Is there any white paper explaining:

the ots format (Also, why don't you just use a simple json (potentially compessed with tar.gz or alike) where binary are encoded as base64 or in base 16 (like sha256sum I guess)? For such small files readability is preferable over file size in my opinion, especially when people may not trust the decoding algorihm.)
the precise mechanism used to do the timestamp, so that I can manually check if an .ots is valid? For instance, do you create one transaction per timestamp (costly), or do you aggregate multiple timestamps into a single request?

Having these information would allow people to actually verify themself that a file is valid, without trusting the code used by the platform.

Jul 06 '22 09:07 tobiasBora

A couple of years ago I dug through the code and documented this for myself. I'm pasting what I documented below, maybe it is of use to you. Basically the .ots file is a variable length byte stream that you need to parse bit by bit and decide how to decode the next byte block depending on byte tag and byte length.

OTS Proof File Format

The content of an ots file is a hex encoded byte stream that contains not only the hash of a claim but also a tree of commitment operations that when executed proof that the claim is linked to the merkle root of a blockchain transaction and can hence attest the claims existence at a given point of time.

File Format

The beginning of the file is static whereas the payload can vary in length and it might also change in case an timestamp is upgraded.

Field	Description	Size
File Header (Magic Number)	\x00 O p e n T i m e s t a m p s \x00 \x00 P r o o f \x00 \xbf \x89 \xe2 \xe8 \x84 \xe8 \x92 \x94 or in pure hex \x00 \x4f \x70 \x65 \x6e \x54 \x69 \x6d \x65 \x73 \x74 \x61 \x6d \x70 \x73 \x00 \x00 \x50 \x72 \x6f \x6f \x66 \x00 \xbf \x89 \xe2 \xe8 \x84 \xe8 \x92 \x94	31 Bytes
Major Version	Defaults to \x01 - which can be read as version one	1 Byte
File Hash	Length of the hash is deducted from the File Hash Algorithm, e.g. 32 Bytes when using SHA256	variable
Payload	Merkle Path(s) and Attestation(s)	variable
File Hash Algorithm	Operation is determined by the Tag of a Unary Operation, e.g. 0x08 for SHA256	1 Byte

Payload

The payload of an ots file consists of the commitment operations that are necessary to reach the merkle root of the Bitcoin block header. An operation tree always ends in an attestation. The payload always starts with an tag, a tag is a single byte marker that describes how the byte stream should be processed next. If the tag is a binary (f0, f1) or unary (02, 03, 08, 67) operation, it is followed by one or more bytes encoding an signed integer that describes the length of the data to which the operation should be applied, followed by the data (payload).

Operation block

Tag	Storage length	Payload
1 Byte	variable	variable

List of marker tags

Type	Tag	Result length
Binary Operations
OpAppend	\xf0	variable
OpPrepend	\xf1	variable
Unary Operations
OpSha1	\x02	20 Bytes
OpRIPEMD160	\x03	20 Bytes
OpSha256	\x08	32 Bytes
OpKECCAK256	\x67	32 Bytes
Attestation
BlockHeaderAttestation	\x00	variable
BlockHeaderAttestation	\x00FF	variable

Attestation Block

If the tag is 00 or 00FF the operation tag is followed by an attestation tag.

Tag	Attestation Tag	Storage length	Payload
1 Byte	8 Bytes	variable	variable

List of attestation tags

Type	Attestation Tag	Attestation Length
BitcoinBlockHeaderAttestation	\x05 \x88 \x96 \x0d \x73 \xd7 \x19 \x01	Block height as signed integer
EthereumBlockHeaderAttestation	\x30 \xfe \x80 \x87 \xb5 \xc7 \xea \xd7	Block height as signed integer
LitecoinBlockHeaderAttestation	\x06 \x86 \x9a \x0d \x73 \xd7 \x1b \x45	Block height as signed integer
PendingAttestation	\x83 \xdf \xe3 \x0d \x2e \xf9 \x0c \x8e	Calendar Server URI Length

If there is more than one attestation in the file the attestation is followed by a unary or binary operation again.

Example - Append Operation

Tag	Data Length	Data
F0	10	0e 9b 49 ab 01 34 50 80 35 ce 26 97 f1 ad 09 85

Example - Bitcoin Attestation

Tag	Attestation Tag	Data Length	Data
00	05 88 96 0D 73 D7 19 01	03	F0 9D 1A

Data Length Encoding

Bitcoin encodes Storage length in an unsigned int, to quote the Bitcoin wiki.

Wiki: Variable_length_integer

Integer can be encoded depending on the represented value to save space. Variable length integers always precede an array/vector of a type of data that may vary in length. Longer numbers are encoded in little endian.

Value	Storage length	Format
< 0xFD	1	uint8_t
<= 0xFFFF	3	0xFD followed by the length as uint16_t
<= 0xFFFF FFFF	4	0xFE followed by the length as uint32_t
-	9	0xFF followed by the length as uint64_t

Whereas OpenTimestamps encodes Storage length signed, therefore all bytes need to be truncated to 7 bits to get the actual values.

The same is true for the encoding of the height of the Bitcoin Attestation, but there seems to be no "official" way to do it. The only reference can be found in the book "Mastering Bitcoin":

https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch08.html

The first byte, 03, instructs the script execution engine to push the next three bytes onto the script stack (see Table A-1). The next three bytes, 0x443b04, are the block height encoded in little-endian format (backward, least significant byte first). Reverse the order of the bytes and the result is 0x043b44, which is 277,316 in decimal.

Raw Transaction

To link the local merkle tree to an Bitcoin transaction the raw transaction is made a part of the file structure as well. Which gives us the following structure:

Local Merkle tree - Ending in the local Merkle root - prepend raw transaction (until OP_RETURN 0x6a) - append rest of the raw transaction - hash sha 256 - hash sha 256 → which gives us the Bitcoin Transaction ID - rest of the bitcoin merkle tree - bitcoin attestation represented by the block height.

Therefore only the block height is needed to verify the timestamp.

Jul 07 '22 09:07 etschelp

Whoo, thanks a lot for this precise documentation! This should be made public IMHO.

Jul 07 '22 11:07 tobiasBora

Whoo, thanks a lot for this precise documentation! This should be made public IMHO.

Note that the documentation in the above comment has a number of errors. For instance, the order of fields in the file header is wrong: https://github.com/opentimestamps/python-opentimestamps/blob/5ac3b0642b29e2092c75d164561d37842cd50d29/opentimestamps/core/timestamp.py#L316

For the most precise documentation, you really need executable code. There's no substitute and that's why the OTS spec will continue to be in the form of code. In the future, most likely a Rust library to make types and error conditions more clear.

Jul 07 '22 16:07 petertodd

Is there any white paper explaining:

1. the `ots` format (Also, why don't you just use a simple json (potentially compessed with tar.gz or alike) where binary are encoded as base64 or in base 16 (like sha256sum I guess)? For such small files readability is preferable over file size in my opinion, especially when people may not trust the decoding algorihm.)

The reason why OTS uses a binary format is because one way or another, a format needs to be parsed. A textual format simply means there are two parsing steps rather than one.

2. the precise mechanism used to do the timestamp, so that I can manually check if an `.ots` is valid? For instance, do you create one transaction per timestamp (costly), or do you aggregate multiple timestamps into a single request?

You can use the ots info command to dump information on what exactly is in an OTS proof. That information is sufficient to evaluate a fully upgraded proof yourself by hand. With the -v verbose flag, it'll even show you, step by step, what the result of each opcode is to check your work.

Jul 07 '22 16:07 petertodd

python-opentimestamps
python-opentimestamps copied to clipboard

Question about data length encoding

OTS Proof File Format

File Format

Payload

Operation block

List of marker tags

Attestation Block

List of attestation tags

Example - Append Operation

Example - Bitcoin Attestation

Data Length Encoding

Raw Transaction

python-opentimestamps python-opentimestamps copied to clipboard

Question about data length encoding

OTS Proof File Format

File Format

Payload

Operation block

List of marker tags

Attestation Block

List of attestation tags

Example - Append Operation

Example - Bitcoin Attestation

Data Length Encoding

Raw Transaction

python-opentimestamps
python-opentimestamps copied to clipboard