msgpack-rust icon indicating copy to clipboard operation
msgpack-rust copied to clipboard

Can't deserialize entire file

Open StuartHadfield opened this issue 1 year ago • 8 comments

I can't deserialize an entire file because the Deserializer does not implement into_iter as other serde libraries do.

How can I get around this?

Code thus far is:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file_path = "./src/foo.msgpack";
    let reader = BufReader::new(File::open(file_path).unwrap());
    let writer = BufWriter::new(File::create("./src/results.json").unwrap());

    let mut deserializer = rmp_serde::Deserializer::from_read(reader);

    // let mut serializer = serde_json::Serializer::new(io::stdout());
    let mut serializer = serde_json::Serializer::pretty(writer);

    serde_transcode::transcode(&mut deserializer, &mut serializer).unwrap();
    serializer.into_inner().flush().unwrap();

    Ok(())
}

StuartHadfield avatar Oct 14 '22 09:10 StuartHadfield

How can I get around this?

Make a PR that adds into_inner

kornelski avatar Oct 14 '22 10:10 kornelski

@kornelski 🤔 do you mean into_iter, not into_inner?

StuartHadfield avatar Oct 14 '22 10:10 StuartHadfield

(I'm happy to have a bash, but I'm a real newbie to Rust, so not sure I'll manage haha)

StuartHadfield avatar Oct 14 '22 10:10 StuartHadfield

I assume you mean into_inner, because Iterator doesn't make sense here.

kornelski avatar Oct 14 '22 13:10 kornelski

Ah... Hmmm 🤔 What does into_inner look like?

I thought making an iterator - because that seems to be how Python's msgpack implementation works (https://github.com/msgpack/msgpack-python/blob/500a238028bdebe123b502b07769578b5f0e8a3a/msgpack/_unpacker.pyx#L539-L540).

into_inner conventionally just returns the wrapped object, right? So we'd return the Reader? Which means we can...?

Also - into_inner is already implemented for Deserializer<ReadReader>

StuartHadfield avatar Oct 17 '22 12:10 StuartHadfield

In that case I'm completely confused about what you want.

Serde fundamentally creates a single object of a given type. There is nothing to iterate in the decoder. Even if you deserialize a vector, you iterate the vector, not the decoder.

I thought you meant into_inner that returns the io::Reader so that you can recycle it for other I/O operations. That's not related to iteration.

kornelski avatar Oct 17 '22 18:10 kornelski

Ah - okay - let me clarify.

If you have serialized the following array of objects into msgpack:

{
  "foo": "bar"
},
{
  "lorem": "ipsum"
}

We should be able to read all of them - out of a file stream. However, once serde_rmp reaches the end of the first object (probably some delineating character?), it concludes decoding, despite the fact there's loads of information still to be read out of the buffer. You can actually see this if you print out the bytes read by fs::read vs what's decoded by rmp_serde.

I thought about into_iter after seeing it in the json implementation of serde - https://docs.rs/serde_json/latest/serde_json/de/struct.Deserializer.html#method.into_iter.

Does that make any more sense @kornelski ?

StuartHadfield avatar Oct 18 '22 08:10 StuartHadfield

I don't think that's a correct usage of serde. Serde is a type-based one-shot deserializer, not a streaming deserializer. It gives you one and exactly one object of the type you've requested. If you've requested a single struct, that's all you will ever get. Two objects next to each other is not a type. If you have multiple objects to deserialize with serde, the deserialize them all into a single Vec<Object>.

kornelski avatar Oct 20 '22 11:10 kornelski