MithunR
MithunR
Null distribution | | 0% nulls | 30% nulls | 50% nulls | 3 nulls out of 4 | 7 nulls out of 8 | 90% nulls -- | --...
This issue is partially addressed by the following PR: https://github.com/NVIDIA/spark-rapids/pull/7013 In the 22.12 release, there is support for default Hive text tables (i.e. `^A` separated fields, newline separated records, `\\`...
This issue should be safe to close now. The pending items are as follows: 1. Support for complex types (`ARRAY`, `STRUCT`, `MAP`, `BINARY`). 2. Support for escape characters. 3. Support...
> I'll have to check the spec to see if these padding bytes are forbidden or this is just a gray area. Thank you for the analysis, @etseidl. This has...
Tagging @davidwendt.
I've placed this in draft, for the moment. I should do some more testing from `spark-rapids`, to make sure this fits.
I should have updated here sooner. It's possible that I was mistaken regarding the improvement from this change. I can't seem to reproduce the perf gains from last week. From...
Hat tip to @tgujar for this one: For the multi-string kernel, [this part](https://github.com/rapidsai/cudf/pull/15536/files#diff-048f86c21559b14f64f86aaeaa57776d366c3a4948a5aba7c0ab1a3801be87bcR389-R391) is interesting: ```c++ auto const lane_idx = idx % cudf::detail::warp_size; auto const str_idx = (idx / cudf::detail::warp_size)...
On my last update, I indicated some positive news regarding total kernel times: Batching calls to `strings::contains()` for multiple targets reduces the total time spent in the kernel. For 100M...
From my experimentation, it appears that we might have taken this nearly as far as we can with this algo. There will be incremental improvements if we allocate the multiple...