rust-bencode icon indicating copy to clipboard operation
rust-bencode copied to clipboard

Reading a string which contains invalid UTF8

Open aochagavia opened this issue 11 years ago • 11 comments

In a .torrent file, the hashes of the pieces are saved as a string. However, this string does not match Rust's String type, because it contains incorrect UTF8. What we actually want is to save the hashes in a Vec<u8>, but then the library will try to parse a list and produce an error.

Is there a work-around for this?

aochagavia avatar Jul 13 '14 15:07 aochagavia

The only way to do this at the moment is to use custom FromBencode implementation

arjantop avatar Jul 14 '14 15:07 arjantop

I think it would be a good idea to create a BencodeString type, which contains a Vec<u8> without the guarantee of UTF8 correctness (similar to the Key struct). You could provide some additional methods such as to_string, as_bytes, etc. This seems to be the best way to solve this issue.

Should I make some experiments and submit a PR? If you prefer to do it yourself it is also ok!

aochagavia avatar Jul 14 '14 18:07 aochagavia

That was the plan for Key (should be renamed to be more general) but only custom encoding is implemented, there is a problem getting the required data from a decoder (can't make every string unchecked).

But I have an idea: I can add another DecoderResult error named StringEncoding that would contain the original &[u8] and custom decode implementation can read the string, check the error if it is StringEncoding and return that.

arjantop avatar Jul 14 '14 18:07 arjantop

Is this because the Decoder trait only provides a read_str method?

aochagavia avatar Jul 14 '14 19:07 aochagavia

@aochagavia let me know if there are any issues with current implementation.

But you still can't use deriving to get the implementation.

arjantop avatar Jul 14 '14 20:07 arjantop

I will try to come up with an idea to solve this...

aochagavia avatar Jul 15 '14 08:07 aochagavia

It looks like there is no way to do it...

aochagavia avatar Jul 15 '14 10:07 aochagavia

I have opened an issue (https://github.com/rust-lang/rust/issues/15683) in rustc to extend the Encoder and Decoder traits.

aochagavia avatar Jul 15 '14 10:07 aochagavia

FWIW I pattern-match the vector out of the ByteString, see here. It's not ideal (and now broken because of the changes) but it works.

andor44 avatar Jul 18 '14 15:07 andor44

What does not work exactly? you just have to use util::ByteString namespaced or import it under a different name.

arjantop avatar Jul 18 '14 15:07 arjantop

I meant that the changes broke the commit that the link is pointing to, not that it's wrong in any way; quite the opposite actually, I much prefer the new way.

andor44 avatar Jul 18 '14 16:07 andor44