flexstr icon indicating copy to clipboard operation
flexstr copied to clipboard

Offer a boxed string by default?

Open epage opened this issue 2 years ago • 13 comments

It'd be interesting to see under what situations box might be useful.

epage avatar Mar 25 '22 19:03 epage

I played with a box but couldn't find a reason to add it. Allocation seemed about the same (I may need to revisit and check again in light of your benchmarks), but clone is slow given the allocation/copy. Also, this is kinda the antithesis to FlexStr which is about fast clones and not needlessly copying data.

nu11ptr avatar Mar 25 '22 19:03 nu11ptr

I also should add you can do this on your own. This was my example in the "make your own string" section. If you can give me a use case I'm not against adding necessarily either, but need to see a use case where it outperforms Arc

nu11ptr avatar Mar 25 '22 19:03 nu11ptr

I'd recommend reconsidering. The cost of a Box option is fairly low for maintenance / documentation. A big benefit is it'll help the user selecting the option that works for their benchmarks. I'm thinking of this from the perspective of re-evaluate what strings to use in toml_edit. Profiling is what led me to use small-string optimization. I'm going to be profiling what crate and crate settings work best for my application. I'll want to evaluate this option but having to make my own gets in the way of doing so and sours me a bit on the crate.

epage avatar Mar 25 '22 19:03 epage

So your use case is that you've found Box to allocate faster than Arc? Just want to understand the driver.

nu11ptr avatar Mar 25 '22 19:03 nu11ptr

The biggest issue I'm having is one of identity. Right now LocalStr and SharedStr are a pair and I plan to release optimizations to go back and forth. If I add a 3rd it would be "on the side" and not integrated into the overall scheme. Plus I'm trying to find the "why" - none of my benchmarks showed it made sense, so I'm looking for what Box does better. I want to reproduce your benchmarks because mine didn't show an advantage.

nu11ptr avatar Mar 25 '22 19:03 nu11ptr

I'm thinking back to when I first tried FlexStr in toml_edit and it was slower. That might have been other factors but I'll want to double check Box vs Arc to be sure.

Keep in mind, this is giving users the opportunity to evaluate against their own applications. I can't do that to show its faster without implementing it myself which I have no incentive to do, making this circular.

epage avatar Mar 25 '22 19:03 epage

FlexStr 0.8.0 had alignment issues due to the enum. That made it's performance inconsistent. Mostly it seems in inline strings but I can't seem to nail it down. 0.9.x doesn't have that issue due to the union - it is very fast for many ops (and has consistent micro bms)

nu11ptr avatar Mar 25 '22 20:03 nu11ptr

Yes, that could be it but it introduced enough doubt. In part, I filed all of these issues thinking about Josh's comment about what I would suggest. I suspect FlexStr is getting close to one I might generally suggest but I have various doubts or use cases so I filed these.

epage avatar Mar 25 '22 20:03 epage

I appreciate that. I'm interested in getting use and willing to do the work, so let me know what you need. I argue because I want to understand the 'why'. I usually get there even if not right away. :-)

nu11ptr avatar Mar 25 '22 20:03 nu11ptr

I definitely agree about seeking the "why"

Another one I just remembered on top of the previous ones I gave, when I pulled kstring into toml_edit, I benchmarked the various feature flags to see what would get the best performance. In my test cases, Arc was slower and I didn't enable it.

See https://github.com/ordian/toml_edit/blob/master/Cargo.toml#L44

epage avatar Mar 25 '22 20:03 epage

I would be interested to hear how you used it. In my benchmarks Arc and Rc are slightly slower than String on allocation (like 5-10% tops) and are much faster on clone. The only thing I didn't test yet (and it is important) is contended cloning of Arc in multiple threads. Real world might be different of course and sounds like it might be.

nu11ptr avatar Mar 25 '22 20:03 nu11ptr

toml_edits easy API is a fairly low-clone situation. You basically allocate the memory to then turn around and give it back to the user in their preferred data type via serde (we care more about the performance of this API than the editing API).

Yes, the focus of flexstr is on the cloning case but adding box support would make it more universally usable. There would be less need to have a user pick and choose which crate to use in which situation.

epage avatar Mar 25 '22 20:03 epage

Fair points - really it is as easy as me adding this:

type BoxStr = FlexStrBase<Box<str>>;

nu11ptr avatar Mar 25 '22 21:03 nu11ptr