arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Discuss][C++] Replace MemoTable with a SwissTable implementation

Open js8544 opened this issue 1 year ago • 21 comments

Describe the enhancement requested

This is a follow up of https://github.com/apache/arrow/issues/36059#issuecomment-1592037527. There are many cases where we use a MemoTable, e.g. set lookup functions, vector hash functions, the count_distinct aggregate function, dictionary unification etc. Their performance can be boosted with a SwissTable.

We can either:

  1. Use an existing swiss table library. This requires some work on dependency management. I recommend absl::flat_hash_map since they are the original author of swiss tables and we already has absl in our 3rd party toolchain.
  2. Write one by ourselves. The current one in acero is too customized for the join node and it seems hard to extract a general hashtable from it. If I were to write one I would probably follow the structure of absl's and replace things like memory management and bit tweaking with Arrow ones.

What do you think? @pitrou @westonpace

Component(s)

C++

js8544 avatar Oct 20 '23 08:10 js8544

I don't know what a "Swiss table" is. Can you point to a general description?

pitrou avatar Oct 20 '23 08:10 pitrou

More generally, I'm all for a faster associative table implementation.

pitrou avatar Oct 20 '23 08:10 pitrou

Maybe we should first having an memory-usage / performance benchmark on this?

mapleFU avatar Oct 20 '23 08:10 mapleFU

There's an Abseil's article and a CppCon talk that provide a good intro. The key idea is, when looking up a key via linear probing, it can check multiple slots at once.

js8544 avatar Oct 20 '23 08:10 js8544

Ah, so it's a hash table with SIMD-optimized lookup?

pitrou avatar Oct 20 '23 09:10 pitrou

Maybe we should first having an memory-usage / performance benchmark on this?

Okay, I'll try to draft a proof of concept with the is_in kernel.

js8544 avatar Oct 20 '23 09:10 js8544

If you're going to use SIMD lookups, then I would ask that it relies on xsimd instead of intrinsics.

pitrou avatar Oct 20 '23 09:10 pitrou

Ah, so it's a hash table with SIMD-optimized lookup?

Well it's a speedup even without SIMD instructions. To put it in a oversimplified term each slot occupies 1 byte so with int64 we can already check 8 slots at the same time. But SIMD can definitely make it even faster.

If you're going to use SIMD lookups, then I would ask that it relies on xsimd instead of intrinsics.

Sure, if we decide to write our own version.

js8544 avatar Oct 20 '23 09:10 js8544

Well it's a speedup even without SIMD instructions. To put it in a oversimplified term each slot occupies 1 byte so with int64 we can already check 8 slots at the same time. But SIMD can definitely make it even faster.

I see, thanks for the explanation.

Sure, if we decide to write our own version.

Well, we don't want to depend on Abseil, so our own version it should probably be :-)

pitrou avatar Oct 20 '23 09:10 pitrou

There's an Abseil's article and a CppCon talk that provide a good intro. The key idea is, when looking up a key via linear probing, it can check multiple slots at once.

Thank you for explaining. I was also curious about what Swiss Join is too. :)

llama90 avatar Oct 20 '23 09:10 llama90

I wrote a simple proof of concept for IndexIn [1] with a slightly tweaked version of absl::flat_hash_map. (Tweak is [2], a couple of operations can be skipped because our MemoTable doesn't need to support deletion). And the existing IndexIn benchmarks (results below) show about 20% ~ 60% improvement depending on the value_set size, the larger the better.

The benchmarks are run on my Mac M1 so the results may vary on a x86 machine. But I think the diff is compelling enough for us to proceed with a Swiss table implementation.

[1] https://github.com/apache/arrow/compare/main...js8544:jinshang/swiss_proof_of_concept?diff=split#diff-d288c253c3a5959aa1b6840a649a8dafe42d14ac49d1837b19dd3a44c58f7fd9R93 [2] https://github.com/apache/arrow/commit/7d14d10d67826f4be7588b8b36d7aef7655dbcb3

Benchmark Results

Before

image

After

image

js8544 avatar Oct 24 '23 10:10 js8544

Can you post benchmark numbers on string values? While hashing integers is nice, I'm not sure that's a dominant use case.

pitrou avatar Oct 24 '23 10:10 pitrou

Sure. I chose Int32 because it was the only one tested with a large value set. I just wrote a similar set of benchmarks for strings and here's the result.

Before

image

After

image

js8544 avatar Oct 24 '23 10:10 js8544

Ok. I agree we could replace MemoTable with something better. However, we don't want to have a dependency on Abseil, so it will have to be reimplemented.

pitrou avatar Oct 24 '23 11:10 pitrou

Ok. I agree we could replace MemoTable with something better. However, we don't want to have a dependency on Abseil, so it will have to be reimplemented.

Yes. I'll implement one using xsimd as you suggested. Also the swiss table may be worse than MemoTable on super small tables, since it checks 8 slots at once and reserves at least 8 slots on construction, as shown in the first two string benchmarks. We'll need further benchmarking to decide which hashtable to use in each individual use cases, or maybe with a dynamic dispatching mechanism.

js8544 avatar Oct 24 '23 11:10 js8544

I guess this might benifit the parquet dictionary encoding, which is widely used in parquet. If nobody move forward, I'll have a try next month

mapleFU avatar Mar 08 '24 02:03 mapleFU

Yeah I'm afraid I won't have time to do this recently. Please have a try if you wish. Thanks!

js8544 avatar Mar 08 '24 04:03 js8544

Will the swiss table currently being used in acero's hash join (aka. swiss join) help on this?

zanmato1984 avatar Mar 08 '24 04:03 zanmato1984

The swiss table for hash join is very specialized and kind of hard to generalize. It can't be directly used for memotable.

js8544 avatar Mar 08 '24 04:03 js8544

Could I try to solve this issue?

SGZW avatar Mar 12 '24 17:03 SGZW

@SGZW Anyone can work on an issue without asking for permission. Feel free to experiment and submit a PR when ready.

pitrou avatar Mar 12 '24 18:03 pitrou