string-interner
string-interner copied to clipboard
Fallible APIs for StringInterner for #70
Hi @Robbepop!
This is my take on issue #70.
This pull request is NOT READY to merge. I haven't added any tests nor did I check whether new code breaks anything in the existing test suites.
Let's iterate this through here instead.
First, I haven't found issues adding fallible API to Backend implementations (except the capacity issue, below).
The most problematic change involves the dedup HashMap:
- to implement
try_reserveI needed to directly access HashTable, thus arawfeature needed to be enabled forhashbrown - there's a substanstial addition here: https://github.com/royaltm/string-interner/blob/e8a7616160f8834f53927d72532aab12ec27bf2a/src/interner.rs#L163 which might beg for a bigger refactor to this function
I'm also a bit concerned about the sanity of the with_capacity and the new try_reserve API, since the implementation relies on guesswork - estimating average word size by google - like with everything "on average" will mostly be of no use for most particular cases.
What if instead we give a user a way to estimate the "average" string size? Or perhaps even the separate additional capacity for string data.
@royaltm thank you for the PR!
I think this PR generally goes into the right direction.
The RawMap API should have been implemented and used by the StringInterner long ago, but I didn't get to it. Maybe it would be nice to separate this change into another PR and basing this PR on it. The idea is to no longer use hashbrown's RawEntry API and instead use its RawMap API for everything.
CI is currently unhappy with docs and formatting. Also a memory consumption test is broken because the number of expected allocations decreased. Can you explain why?
I am a bit sceptic about the usefulness of the reserve API for StringInterner in general after seeing the mess that it introduces. The reserve API was once useful when we still had the SimpleBackend. However, the SimpleBackend got removed because it was too inefficient and thus served no real purpose. Without it the reserve API depends too much on heuristics and should probably be removed or redesigned to not be based on the expected number of strings to intern but on the expected bytes of all expected strings to be interned but even then it wouldn't be ideal imo.
Perhaps different number of allocations must have something to do with changes to the buffer backend where I introduced the calculate_var7_size function so I am able to reserve space for the next entry including the var7 size marker ahead of pushing to the backend buffer.
Now the buffer backend is resized only once before encode_var_usize is called and the string itself is pushed.
Previously var7 size marker was pushed separately from the string itself and each operation could possibly reallocate backend buffer. But that's just my guess.
As for the RawTable API, I'm not quite sure what advantage it would have over current implementation. Perhaps just less intermediary function calls?
As for the RawTable API, I'm not quite sure what advantage it would have over current implementation. Perhaps just less intermediary function calls?
I have read somewhere that the RawEntry API of the hashbrown crate was to be phased out in favor of the newer and more versatile RawMap type. Since you are now making use of the raw crate feature I thought it would make sense to go full in with the change. Not necessary though. We can go ahead without it.
I suppose moving to RawMap interface should be only an internal change. I propose we can tackle this issue after the one at hand.
Regarding try_reserve and with_capacity interface I agree that its usefulness is pretty questionable. One thing is that It's really hard to make predictions ahead. Especially with the BucketBackend and what makes the problem even worse is that we can't shrink_to_fit the head String due to the known problems with reallocation.
Perhaps I'll remove the try_reserve calls from API or make them internal where they are still required, and we'll see where that get us.
I suppose moving to
RawMapinterface should be only an internal change. I propose we can tackle this issue after the one at hand.Regarding
try_reserveandwith_capacityinterface I agree that its usefulness is pretty questionable. One thing is that It's really hard to make predictions ahead. Especially with theBucketBackendand what makes the problem even worse is that we can'tshrink_to_fitthe head String due to the known problems with reallocation.Perhaps I'll remove the
try_reservecalls from API or make them internal where they are still required, and we'll see where that get us.
The try_reserve probably does not make sense for the Backend to implement since it is too backend specific. At the moment no backend has a proper and well defined way to do it. For the StringInterner itself try_reserve could still be useful for its hashmap to deduplicate strings, however, it is probably not much help in practice.
Hi @Robbepop!
Sorry for the delay, I've had a whirlwind of obligations to fulfill in the passing weeks and I'd rather not contribute at all than contribute haphazardly during my lunch break as good and quickly seldom meet.
- I removed the
Backend::try_reservefrom trait and backend implementations, - I left
StringInterner::try_reservewith a doc note about what it does exactly, - fixed docs issues,
- added a mention of
try_get_or_internin the root doc of theStringInterner. - testing of
calculate_var7_sizetogether with eachencode_var_usize, - updated number of (de)allocations in
buffer_backend::test_memory_consumption.
If this meets your criteria I don't think there's anything left considering the scope of this PR.
Hi @Robbepop,
I've fixed formatting errors, I'm not sure however what can I do to fix the tarpaulin issue. Is there anything else on my end that needs to be done?
I just ran benchmarks for this branch locally and compared them to the current main branch.
Unfortunately I saw quite some big performance regressions with this PR:
get_or_intern/fill-empty/new/: ~10% regressionget_or_intern/fill-empty/with_capacity/: ~20-30% regressionget_or_intern/already-filled/: ~5% regressionget_or_intern_static:BuckedBackendimproved by 5%, other backends regressed by 10%
All in all this looks good. The benchmark regressions need to be fixed before we can merge though.
I'm beginning to think that without stabilization of push_within_capacity we can't do much to improve backend's fill performance, as we ended up with an algorithm that checks the capacity of the vector multiple times for a single push operation. Please see the comment above https://github.com/Robbepop/string-interner/pull/71#discussion_r1648808767 for more on this.
If I'm not mistaken another area of improvement might be in the raw hash-map department. With the latest changes I have also introduced redundant capacity checks in the try_get_or_intern_using function.
Also, while I was pondering on the var-int buffer encoding issue if we are going to address the "2 phased encoding step" and we decide to try the single-step array encoding approach I'd also suggest to change the var-int algorithm to the more efficient one:
I've created a gist with more info on the topic and playground examples: https://gist.github.com/royaltm/f31ce5b1c8878e7c10ff7d335f3826e2
Please feel free to check it out at your leisure.
Perhaps a better approach is to use https://doc.rust-lang.org/alloc/vec/struct.Vec.html#method.spare_capacity_mut.
Something along the lines of: https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=bd41840d6595d2787c2f60b86a67e84b
Although a lot of additional unsafety is introduced the generated assembly looks neat:
There's a very tight loop at: .LBB10_5 and a single capacity pre-check that branches to .LBB10_2 to make more room.
playground::encode_var_usize:
push r14
push rbx
push rax
mov rbx, rsi
lea rdx, [rsi + 64]
mov rax, qword ptr [rdi]
mov rsi, qword ptr [rdi + 16]
sub rax, rsi
cmp rax, rdx
jb .LBB10_2
mov rcx, qword ptr [rdi + 8]
add rcx, rsi
xor eax, eax
cmp rbx, 128
jb .LBB10_6
.LBB10_4:
mov rdx, rbx
.LBB10_5:
mov r8d, edx
or r8b, -128
mov byte ptr [rcx + rax], r8b
inc rax
add rdx, -128
mov rbx, rdx
shr rbx, 7
cmp rdx, 16383
mov rdx, rbx
ja .LBB10_5
.LBB10_6:
mov byte ptr [rcx + rax], bl
lea rcx, [rsi + rax]
inc rcx
inc rax
mov qword ptr [rdi + 16], rcx
add rsp, 8
pop rbx
pop r14
ret
.LBB10_2:
mov r14, rdi
call alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
mov rdi, r14
mov rsi, qword ptr [r14 + 16]
mov rcx, qword ptr [r14 + 8]
add rcx, rsi
xor eax, eax
cmp rbx, 128
jae .LBB10_4
jmp .LBB10_6