heed icon indicating copy to clipboard operation
heed copied to clipboard

add new trait to encode values to bytes

Open antonilol opened this issue 1 year ago • 10 comments

allow choosing return type of BytesEncode

  • allow any type that holds bytes instead of just Cow<'_, [u8]>
  • allow any type that holds an error instead of just BoxedError
  • add hint to use writer instead of returning bytes (zero_copy method), retuns BoxedError for now
  • hint to using writers for serde serialization types

Pull Request

Related issues

#165, #257

What does this PR do?

Introduces a new trait that allows more efficient implementations of encoding values to bytes by allowing users to return any AsRef<[u8]> + Into<Vec<u8>> type, possibly borrowed.

TODOs

  • [x] fix tests
  • [x] fix remaining deprecation warnings (first I only fixed the errors so the code compiles again)
  • [x] fix cookbook
  • [x] FixedSizeBytes: use references or not? values are expected to be small so no reference seems like the best
  • [ ] statically dispatchable error values are still converted to dyn std::error::Error (#166 uses a generic Error type, seems like the best solution)
  • [ ] optimize methods like Database::range where byte types are converted to Vec<u8>, possibly unnecessary because the type maybe already is owned (does not depend on 'a), like [u8; N]?
  • [ ] actually use it, in methods like Database::put (see "Example Usage" in #257 for an example)

antonilol avatar Aug 17 '24 14:08 antonilol

Hey @antonilol 👋

Thank you again for your help and PR. I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.

Would it be possible to:

  • Implement a version of the trait from #257
  • Directly remove the previous trait (not deprecate it)

Kerollmops avatar Aug 17 '24 15:08 Kerollmops

I am wondering if we could follow a little more #257 and provide a "writer" (maybe start with a Vec<u8>) to experiment. So that we can return either owned or borrowed data but the writer could directly be a buffer provided by LMDB, cf: Database::put_reserved.

This is certainly possible, what I have in mind right now is a method that by default forwards to to_bytes, but can be optionally implemented if desired.

Would it be possible to:

  • Implement a version of the trait from #257

With ZERO_COPY associated const and size hint function?

  • Directly remove the previous trait (not deprecate it)

I sure can, with the original names (BytesEncode, EItem, bytes_encode)?

antonilol avatar Aug 17 '24 15:08 antonilol

With ZERO_COPY associated const and size hint function?

No, with the zero_copy method. Not the const one.

I sure can, with the original names (BytesEncode, EItem, bytes_encode)?

Yup, would be perfect.

Kerollmops avatar Aug 17 '24 17:08 Kerollmops

No, with the zero_copy method. Not the const one.

Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write ~(for now because of error handling)~. I see no zero_copy method in that linked issue.

antonilol avatar Aug 17 '24 18:08 antonilol

In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.

A new Either like 2 value enum instead of Either will work too if the dependency is an issue.

antonilol avatar Aug 18 '24 18:08 antonilol

In a5c96f1 I used Either (a generic 2 value enum) to return different error types without using dynamic dispatch (dyn StdError), this could also solve the issue in #166 with the function that decodes 2 different types and thus needs to handle 2 different errors.

A new Either like 2 value enum instead of Either will work too if the dependency is an issue.

Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).

Do you maybe mean the encode_writer method? Right now I have BytesEncode::bytes_encode_into_writer which is like it, it uses &mut Vec<u8> instead of &mut impl Write ~(for now because of error handling)~. I see no zero_copy method in that linked issue.

I was talking about the zero_copy method I am talking about in my other comment.

Kerollmops avatar Aug 19 '24 11:08 Kerollmops

Thank you for that, but I would prefer we keep these PR changes only for trait improvement. So that it's easier to review as less code changes. I'm also not a huge fan of returning a Result<Cow<[u8]>, Either<Self::Error, io::Error>>, but let's not discuss this here but on #165 (or #166).

When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.

I was talking about the zero_copy method I am talking about in my other comment.

Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.

antonilol avatar Aug 20 '24 16:08 antonilol

When accepting std::io::Write implementation to write to errors need to be handled (returned) in some way. I can revert it back to using &mut Vec<u8> instead, for which extend_from_slice can be used (will never return errors) instead of Write::write_all, or leave out bytes_encode_into_writer completely.

Can you Box the error for now? We will figure that out later when we have a good trait.

Do you mean dynamically choosing to borrow or return an owned type? This is still possible: to implement the trait the return value (ReturnBytes) need to implement Into<Vec<u8>> and AsRef<[u8]>, which Cow<[u8]> does. No functionality is lost with this new trait.

In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.

Kerollmops avatar Aug 20 '24 18:08 Kerollmops

Can you Box the error for now? We will figure that out later when we have a good trait.

Sure, done

In issue #257, we are talking about a BytesEncode::ZERO_COPY associated const type that is a boolean, allowing optimizations by avoiding allocating a vector. However, I proposed changing this const to a method instead BytesEncode::zero_copy so that we can dynamically decide whether this value can be used or must be serialized into a buffer.

Now I get it, thanks for the patience in explaining it :), added the method to the trait

antonilol avatar Aug 20 '24 21:08 antonilol

BytesEncode::size_hint should also be needed as lmdb wants to know the size before giving the reserved space (right?). I assume then that the writer method will only be used when zero_copy outputs false, and size_hint outputs Some(_). The same can be achieved by combining the 2 into one method, that outputs None when bytes_encode does not copies or a size hint can't be computed, and Some(size_hint) when bytes_encode does copies and a size hint can be computed. This can save some double work needed for certain types that need the same computation for zero_copy and size_hint.

Should I add this, and if yes, how should that function be called? Thinking about writer_size_hint, can also be just size_hint but with clear docs on returning None for cheap-to-encode types, etc. Instead of Option<usize> some custom type can also be used that is more clear what it does, like

pub enum EncodingHint {
    PreferWriter { size_hint: usize },
    PreferBytesEncode,
}

Let me know what you think about this!

antonilol avatar Jul 14 '25 15:07 antonilol