string-interner icon indicating copy to clipboard operation
string-interner copied to clipboard

Symbol overflow in `BufferBackend` doesn't panic

Open Gaming32 opened this issue 4 weeks ago • 5 comments

If I use a BufferBackend<SymbolU16>, the same value can sometimes return different symbols. Neither BufferBackend<SymbolU32> or StringBackend<SymbolU16> have this issue. It's only sometimes. With 12,286 strings, I ended up with between 12,800 and 13,200 symbols, whereas the others always ended up with 12,286 symbols.

Gaming32 avatar Dec 04 '25 15:12 Gaming32

Tracking this down further, this appears to actually be the fault of gen_symbol_for. Specifically, I believe my symbols got too big for SymbolU16 due to it being non-sequential. However, the try_from_usize implementation uses as to convert to usize instead of try_into, leading to overflow without hitting the encountered invalid symbol.

Gaming32 avatar Dec 04 '25 16:12 Gaming32

Tracking this down further, this appears to actually be the fault of gen_symbol_for. Specifically, I believe my symbols got too big for SymbolU16 due to it being non-sequential. However, the try_from_usize implementation uses as to convert to usize instead of try_into, leading to overflow without hitting the encountered invalid symbol.

Yep this is exactly the issue. I wonder why TryInto was not used - maybe a performance issue? It should be used there in my opinion, so good catch!

The major difference with BufferBackend compared to the others is that indeed the BufferBackend has no contiguous index space. Therefore, you easily run out of symbols, especially when only using SymbolU16. If possible I'd use BufferBackend with usize based symbols. That way you avoid any checks and conversions and it should usually still be really fast.

Robbepop avatar Dec 04 '25 16:12 Robbepop

The thing with using BufferBackend with bigger symbols is that that actually negates all the memory savings that I have from it to begin with. In fact, it uses several MB more.

Gaming32 avatar Dec 04 '25 16:12 Gaming32

The thing with using BufferBackend with bigger symbols is that that actually negates all the memory savings that I have from it to begin with. In fact, it uses several MB more.

Its all a trade-off. Maybe using u32 would be a nice sweet spot then? That is exactly why string interner provides different backends for different needs.

Robbepop avatar Dec 04 '25 16:12 Robbepop

This several MB more was with u32 actually. The StringBackend is actually the perfect sweet spot. Thanks for giving these different backends!

Gaming32 avatar Dec 04 '25 16:12 Gaming32