add new trait to encode values to bytes
allow choosing return type of BytesEncode
- allow any type that holds bytes instead of just
Cow<'_, [u8]> - allow any type that holds an error instead of just
BoxedError - add hint to use writer instead of returning bytes (zero_copy method), retuns BoxedError for now
- hint to using writers for serde serialization types
Pull Request
Related issues
#165, #257
What does this PR do?
Introduces a new trait that allows more efficient implementations of encoding values to bytes by allowing users to return any AsRef<[u8]> + Into<Vec<u8>> type, possibly borrowed.
TODOs
- [x] fix tests
- [x] fix remaining deprecation warnings (first I only fixed the errors so the code compiles again)
- [x] fix cookbook
- [x] FixedSizeBytes: use references or not? values are expected to be small so no reference seems like the best
- [ ] statically dispatchable error values are still converted to
dyn std::error::Error(#166 uses a generic Error type, seems like the best solution) - [ ] optimize methods like
Database::rangewhere byte types are converted toVec<u8>, possibly unnecessary because the type maybe already is owned (does not depend on'a), like[u8; N]? - [ ] actually use it, in methods like
Database::put(see "Example Usage" in #257 for an example)
Hey @antonilol 👋
Thank you again for your help and PR. I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.
Would it be possible to:
- Implement a version of the trait from #257
- Directly remove the previous trait (not deprecate it)
I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a
Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf:Database::put_reserved.
This is certainly possible, what I have in mind right now is a method that by default forwards to to_bytes, but can be optionally implemented if desired.
Would it be possible to:
- Implement a version of the trait from #257
With ZERO_COPY associated const and size hint function?
- Directly remove the previous trait (not deprecate it)
I sure can, with the original names (BytesEncode, EItem, bytes_encode)?
With
ZERO_COPYassociated const and size hint function?
No, with the zero_copy method. Not the const one.
I sure can, with the original names (BytesEncode, EItem, bytes_encode)?
Yup, would be perfect.
No, with the
zero_copymethod. Not the const one.
Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write ~(for now because of error handling)~. I see no zero_copy method in that linked issue.
In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.
A new Either like 2 value enum instead of Either will work too if the dependency is an issue.
In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (
dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.A new Either like 2 value enum instead of Either will work too if the dependency is an issue.
Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).
Do you maybe mean the
encode_writermethod? Right now I haveBytesEncode::bytes_encode_into_writerwhich is like it, it uses&mut Vec<u8>instead of&mut impl Write~(for now because of error handling)~. I see nozero_copymethod in that linked issue.
I was talking about the zero_copy method I am talking about in my other comment.
Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a
Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).
When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.
I was talking about the
zero_copymethod I am talking about in my other comment.
Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.
When accepting
std::io::Writeimplementation to write to errors need to be handled (returned) in some way. I can revert it back to using&mut Vec<u8>instead, for whichextend_from_slicecan be used (will never return errors) instead ofWrite::write_all, or leave outbytes_encode_into_writercompletely.
Can you Box the error for now? We will figure that out later when we have a good trait.
Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (
ReturnBytes) need to implementInto<Vec<u8>>andAsRef<[u8]>, whichCow<[u8]>does. No functionality is lost with this new trait.
In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.
Can you
Boxthe error for now? We will figure that out later when we have a good trait.
Sure, done
In issue #257, we are talking about a
BytesEncode::ZERO_COPYassociated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method insteadBytesEncode::zero_copyso that we can dynamically decide whether this value can be used or must be serialized into a buffer.
Now I get it, thanks for the patience in explaining it :), added the method to the trait
BytesEncode::size_hint should also be needed as lmdb wants to know the size before giving the reserved space (right?).
I assume then that the writer method will only be used when zero_copy outputs false, and size_hint outputs Some(_). The same can be achieved by combining the 2 into one method, that outputs None when bytes_encode does not copies or a size hint can't be computed, and Some(size_hint) when bytes_encode does copies and a size hint can be computed.
This can save some double work needed for certain types that need the same computation for zero_copy and size_hint.
Should I add this, and if yes, how should that function be called? Thinking about writer_size_hint, can also be just size_hint but with clear docs on returning None for cheap-to-encode types, etc.
Instead of Option<usize> some custom type can also be used that is more clear what it does, like
pub enum EncodingHint {
PreferWriter { size_hint: usize },
PreferBytesEncode,
}
Let me know what you think about this!