MithunR comments

Results 156 comments of


                                            MithunR

[FEA] Improved performance for strings finder_warp_parallel_fn / contains_warp_parallel_fn kernels

[FEA]Support HiveTableScanExec to scan a Hive text table

This issue is partially addressed by the following PR: https://github.com/NVIDIA/spark-rapids/pull/7013 In the 22.12 release, there is support for default Hive text tables (i.e. `^A` separated fields, newline separated records, `\\`...

[FEA]Support HiveTableScanExec to scan a Hive text table

This issue should be safe to close now. The pending items are as follows: 1. Support for complex types (`ARRAY`, `STRUCT`, `MAP`, `BINARY`). 2. Support for escape characters. 3. Support...

[BUG] String columns written with `fastparquet` seem to be read incorrectly via CUDF's Parquet reader

> I'll have to check the spec to see if these padding bytes are forbidden or this is just a gray area. Thank you for the analysis, @etseidl. This has...

`strings::contains()` for multiple scalar search targets

Tagging @davidwendt.

`strings::contains()` for multiple scalar search targets

I've placed this in draft, for the moment. I should do some more testing from `spark-rapids`, to make sure this fits.

`strings::contains()` for multiple scalar search targets

I should have updated here sooner. It's possible that I was mistaken regarding the improvement from this change. I can't seem to reproduce the perf gains from last week. From...

`strings::contains()` for multiple scalar search targets

Hat tip to @tgujar for this one: For the multi-string kernel, [this part](https://github.com/rapidsai/cudf/pull/15536/files#diff-048f86c21559b14f64f86aaeaa57776d366c3a4948a5aba7c0ab1a3801be87bcR389-R391) is interesting: ```c++ auto const lane_idx = idx % cudf::detail::warp_size; auto const str_idx = (idx / cudf::detail::warp_size)...

`strings::contains()` for multiple scalar search targets

On my last update, I indicated some positive news regarding total kernel times: Batching calls to `strings::contains()` for multiple targets reduces the total time spent in the kernel. For 100M...

`strings::contains()` for multiple scalar search targets

From my experimentation, it appears that we might have taken this nearly as far as we can with this algo. There will be incremental improvements if we allocate the multiple...