Yunsong Wang comments

Results 178 comments of


                                            Yunsong Wang

Use invoke_one when possible

[FEA] Add kernel launch wrapper

> That would be a simple `if(std::distance(input_begin, input_end) == 0) return;`? @sleeepyjack Yeah, an early exit like that in a variadic template.

Size computation slows bulk insert significantly

Related to asynchronous size computation #102 @esoha-nvidia Thanks for reporting this. We are aware of this issue and it will be addressed during our refactoring work #110.

Switch to cuda::stream_ref

Updates: this is still an experimental feature that requires to define `LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE` https://godbolt.org/z/jfWsjarTz

We do provide performance guidance in the probing sequence doc, e.g.: - https://github.com/NVIDIA/cuCollections/blob/4bdf6063de349be7af8da987cea743aa88e28470/include/cuco/probe_sequences.cuh#L26-L28 - https://github.com/NVIDIA/cuCollections/blob/4bdf6063de349be7af8da987cea743aa88e28470/include/cuco/probe_sequences.cuh#L52-L55 Having a performance tuning section in `README` doesn't seem right.

Yunsong Wang

Use invoke_one when possible

Use invoke_one when possible

[FEA] Add kernel launch wrapper

Size computation slows bulk insert significantly

Switch to cuda::stream_ref

Trie

[ENHANCEMENT]: Perf guide

[BUG]: static_multimap<> insert() not working as expected

[BUG]: static_multimap<> insert() not working as expected

[BUG]: static_multimap<> insert() not working as expected