cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Implemented UDF Filters

Open lamarrr opened this issue 6 months ago • 1 comments

Description

Checklist

  • [ ] I am familiar with the Contributing Guidelines.
  • [ ] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

lamarrr avatar Jun 03 '25 13:06 lamarrr

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Jun 03 '25 13:06 copy-pr-bot[bot]

/ok to test 7f97737

lamarrr avatar Jun 18 '25 06:06 lamarrr

I'll follow up with benchmarks once this is merged

lamarrr avatar Jun 30 '25 13:06 lamarrr

std::vector<std::unique_ptr<column>> filter(
  std::vector<column_view> const& columns,
  std::string const& predicate_udf,
  bool is_ptx,
  std::optional<void*> user_data             = std::nullopt,
  std::optional<std::vector<bool>> copy_mask = std::nullopt,
  rmm::cuda_stream_view stream               = cudf::get_default_stream(),
  rmm::device_async_resource_ref mr          = cudf::get_current_device_resource_ref());

@davidwendt do you think this should be consistent with other APIs and use a table_view instead?

i.e:

std::unique_ptr<table> filter(
  table_view const& table,
  std::string const& predicate_udf,
  bool is_ptx,
  std::optional<void*> user_data             = std::nullopt,
  std::optional<std::vector<bool>> copy_mask = std::nullopt,
  rmm::cuda_stream_view stream               = cudf::get_default_stream(),
  rmm::device_async_resource_ref mr          = cudf::get_current_device_resource_ref());

lamarrr avatar Jul 09 '25 05:07 lamarrr

do you think this should be consistent with other APIs and use a table_view instead?

Yes, a table_view makes sense. Also since all the columns have to be the same size.

davidwendt avatar Jul 09 '25 12:07 davidwendt

Yes, a table_view makes sense. Also since all the columns have to be the same size.

Just remembered it won't work since it can have scalars as input as well. I think we can proceed in its current state and make necessary changes later.

lamarrr avatar Jul 09 '25 12:07 lamarrr

CI is taking forever

lamarrr avatar Jul 10 '25 21:07 lamarrr

/merge

lamarrr avatar Jul 11 '25 00:07 lamarrr

/merge

lamarrr avatar Jul 11 '25 09:07 lamarrr