compact_str icon indicating copy to clipboard operation
compact_str copied to clipboard

WIP: Implement Cow type for CompactString

Open Kijewski opened this issue 3 years ago • 8 comments

Kijewski avatar Aug 30 '22 12:08 Kijewski

Can you please comment if this addition is wanted at all? The prime usecase is serde, which would allow deserialization without copying or allocations is some cases.

Kijewski avatar Aug 30 '22 12:08 Kijewski

This is great! I think a CompactCow type would be useful, what I mainly want to focus on though is the user API. A few questions we should ask ourselves:

  1. Should CompactCow be a separate struct, or should we attempt to add a Clone-on-write functionality to CompactString?
  2. How do we present CompactCow in the documenation?
  3. When borrowing a short string, do we always stay borrowed, or do we copy the data onto the stack?

IMO providing more options to the user is not always the best thing to do. We should aim for an opinionated and (ideally) simpler API

Thanks for the PR :)

ParkMyCar avatar Aug 30 '22 13:08 ParkMyCar

I tried to replicate the methods and implemented traits of Cow<'_, str>, so a user familiar with ~~cattle~~ Cows 😉 should find it easy to use CompactCow.

  1. The problem is that borrowing a string means that we have to keep track of its lifetime. But I guess it would be possible to pub type CompactString = CompactCow<'static>, as not to make using CompactString more difficult. I guess CompactString has much more use cases than CompactCow.
  2. I don't think that we need to draw too much attention to it. Maybe a sentence between the "Properties" and "Traits" sections in the documentation. And a reference at the serde implemention might be useful, too.
  3. Right now I inline string if it fits, and borrow it otherwise. I think that this is the best implementation, and I'd guess without it there would be no functional difference between CompactCow and Cow<str>.

2/3 of the code are trait implementation. The API of the type itself is small, because you can always call cow.to_mut().xyz(), to call CompactString::xyz(). And Deref gives you access to all methods in str. I am not sure if all the other traits are needed, but Cow has them, so copied them. :)

Kijewski avatar Aug 30 '22 14:08 Kijewski

Should CompactCow be a separate struct, or should we attempt to add a Clone-on-write functionality to CompactString?

IMHO I think CompactCow should definitely be a separate struct.

For clone-on-write functionality, we might as well add a new type CompactStrArc that employs the old behavior (<0.4?) which uses an Arc for large string.

Also, I think compact_str can employ tricts used in beef to provide a CompactCow with the same size as CompactString.

P.S. beef is able to store the entire String in 2 u64 by packing the len and cap in one u64. I haven't looked into the details, but I wish CompactString can also adopt similar design so that its minimum size can be 16 on 64bit platform.

NobodyXu avatar Aug 30 '22 14:08 NobodyXu

Right now I inline string if it fits, and borrow it otherwise. I think that this is the best implementation, and I'd guess without it there would be no functional difference between CompactCow and Cow.

Cow<str> uses String for owned, CompactCow uses CompactString for owned, that is enough difference for me. I think using reference might be cheaper there if you is constrained by the lifetime anyway.

NobodyXu avatar Aug 30 '22 14:08 NobodyXu

But I guess it would be possible to pub type CompactString = CompactCow<'static>,

No please, IMO that will be a nightmare. Newtype solution is better than alias for this.

NobodyXu avatar Aug 30 '22 14:08 NobodyXu

I haven't looked at beef. I made CompactCow so that it has the same representation as CompactString. It uses 253 as its last byte to tell if the string is borrowed.

crate::asserts::assert_size_eq!(CompactString, CompactCow, Option<CompactCow>);

Kijewski avatar Aug 30 '22 14:08 Kijewski

I haven't looked at beef. I made CompactCow so that it has the same representation as CompactString. It uses 253 as its last byte to tell if the string is borrowed.

That's good to hear!

I mentioned beef here because I feel it is relevant and it uses some interesting tricks that compact_str might be able to use.

NobodyXu avatar Aug 30 '22 14:08 NobodyXu

Would it make sense for this struct to be "copy on read" for < 24 byte borrowed strings and "copy on write" otherwise? Essentially caching the bytes on the stack for reads to reduce indirections in read heavy loads that don't end up writing much.

maboesanman avatar Feb 23 '23 05:02 maboesanman

@maboesanman, that's exactly how I implemented it :)

https://github.com/ParkMyCar/compact_str/blob/fde06e5dae3077e4051202d3ce56897740c40a8f/compact_str/src/cow.rs#L103-L118 https://github.com/ParkMyCar/compact_str/blob/fde06e5dae3077e4051202d3ce56897740c40a8f/compact_str/src/cow.rs#L434-L439

Kijewski avatar Feb 24 '23 13:02 Kijewski

Implemented for 'static. Making any 'lifetime usable would still be useful, but the implementation is this PR is not needed anymore.

Kijewski avatar Sep 25 '23 17:09 Kijewski