bencode icon indicating copy to clipboard operation
bencode copied to clipboard

Extract a subsection of the original bencoded input

Open kit-ty-kate opened this issue 1 year ago • 1 comments

The BitTorrent Protocol Specification specifies an info_hash value defined as:

The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. This value will almost certainly have to be escaped.

Note that this is a substring of the metainfo file. The info-hash must be the hash of the encoded form as found in the .torrent file, which is identical to bdecoding the metainfo file, extracting the info dictionary and encoding it if and only if the bdecoder fully validated the input (e.g. key ordering, absence of leading zeros). Conversely that means clients must either reject invalid metainfo files or extract the substring directly. They must not perform a decode-encode roundtrip on invalid data.

Currently in the bencode library there doesn't seem to be a way to keep around the original bencoded substring corresponding to a part of the structure, so currently I'm simply re-encoding the decoded data.

So given the above note, I'm wondering:

  • if the decoder will always fails on non-fully-validated input
  • or if not, if there is a way to extract a subsection of the original bencoded input

kit-ty-kate avatar Apr 24 '24 16:04 kit-ty-kate

after a short chat with @c-cube I'm realising that the spec talking about "fully-validated input" made me stray away from what I really want to know:

  • "is the encode/decode operation isomorphic?" To that I can answer to myself that it is not with a very simple example:
# let buf = Buffer.create 10;;
val buf : Buffer.t = <abstr>
# Bencode.encode (`Buffer buf) (Bencode.decode (`String "i04e"));;
- : unit = ()
# Buffer.contents buf;;
- : string = "i4e"

Since it's not, my question then becomes only:

  • is there is a way to extract a subsection of the original bencoded input

kit-ty-kate avatar Apr 24 '24 17:04 kit-ty-kate