arrow icon indicating copy to clipboard operation
arrow copied to clipboard

ARROW-17610: [C++] Support additional source types in SourceNode

Open rtpsw opened this issue 3 years ago • 8 comments

See https://issues.apache.org/jira/browse/ARROW-17610

rtpsw avatar Sep 22 '22 12:09 rtpsw

cc @westonpace, @pitrou

rtpsw avatar Sep 22 '22 12:09 rtpsw

https://issues.apache.org/jira/browse/ARROW-17610

github-actions[bot] avatar Sep 22 '22 17:09 github-actions[bot]

@westonpace, any idea how to fix the Windows linker error here? In the past I was able to fix this by adding inline but this time it didn't work.

rtpsw avatar Sep 25 '22 14:09 rtpsw

@westonpace, any idea how to fix the Windows linker error here? In the past I was able to fix this by adding inline but this time it didn't work.

I suspect this is due to adding Executor, which is not a public symbol, in a constructor parameter as you requested. If so and we insist on exposing the IO executor, I see a couple of options to consider:

  1. Make Executor a public symbol. I'm not sure what issues that might lead to.
  2. Make Executor derive from a new type which would be a public symbol, and use this type in the constructor. This would lead to a not-so-pretty dynamic cast internally.
  3. Expose one or more different (public-symbol) parameters that are used to internally setup the IO executor. This restricts the kind of IO executors that the caller may set up.

rtpsw avatar Sep 25 '22 15:09 rtpsw

Executor is exported and it is included in other places that are part of the public API (e.g. SourceNodeOptions, CSV reader).

I think the problem is that SchemaSourceNodeOptions is templated. Can you try adding something like (note, the first two lines already exist in your PR)...

using ArrayVectorIteratorMaker = std::function<Iterator<std::shared_ptr<ArrayVector>>()>;
using ArrayVectorSourceNodeOptions = SchemaSourceNodeOptions<ArrayVectorIteratorMaker>;
template class ARROW_EXPORT ArrayVectorSourceNodeOptions; // Explicitly export the specialization

westonpace avatar Sep 25 '22 17:09 westonpace

It turns out template instantiations and attributes (in this case, visibility) do not work together nicely. When I try adding code like template<> class ARROW_EXPORT ArrayVectorSourceNodeOptions, I get compiler errors like:

/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/options.h:113:31: error: using typedef-name ‘using ArrayVectorSourceNodeOptions = class arrow::compute::SchemaSourceNodeOptions<std::function<arrow::Iterator<std::shared_ptr<std::vector<std::shared_ptr<arrow::Array> > > >()> >’ after ‘class’
  113 | template<> class ARROW_EXPORT ArrayVectorSourceNodeOptions;
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

and when I try adding code like template<> class ARROW_EXPORT SchemaSourceNodeOptions<ArrayVectorIteratorMaker>, I get compiler warnings like:

/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/options.h:113:31: warning: attributes ignored on template instantiation [-Wattributes]
  113 | template<> class ARROW_EXPORT SchemaSourceNodeOptions<ArrayVectorIteratorMaker>;
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So, both tries do not work, at least with my compiler. There seems to be a proposal documenting this limitation, but I'm not sure what its status is.

In the commit I just made, I opted instead to explicitly define and export non-template symbols.

rtpsw avatar Sep 26 '22 11:09 rtpsw

The CI failures look unrelated to the PR.

rtpsw avatar Sep 27 '22 06:09 rtpsw

@westonpace, to make sure we're on the same page, what remains for this PR to go through?

rtpsw avatar Oct 13 '22 14:10 rtpsw

@westonpace, is it a good time to get back to this PR?

rtpsw avatar Nov 06 '22 14:11 rtpsw

Yes, apologies. Since it's been a month lets give it one last CI pass. I've rebased this (all tests pass locally) and will merge later today.

westonpace avatar Nov 21 '22 18:11 westonpace

Benchmark runs are scheduled for baseline = 7198676ac4d69f0a10bc750647cab1f7fd12a7db and contender = c9293039b5454c99e8a34f8fc3a4602d74874114. c9293039b5454c99e8a34f8fc3a4602d74874114 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2 [Finished :arrow_down:0.57% :arrow_up:0.0%] test-mac-arm [Finished :arrow_down:0.0% :arrow_up:0.0%] ursa-i9-9960x [Finished :arrow_down:0.38% :arrow_up:0.0%] ursa-thinkcentre-m75q Buildkite builds: [Finished] c9293039 ec2-t3-xlarge-us-east-2 [Finished] c9293039 test-mac-arm [Finished] c9293039 ursa-i9-9960x [Finished] c9293039 ursa-thinkcentre-m75q [Finished] 7198676a ec2-t3-xlarge-us-east-2 [Finished] 7198676a test-mac-arm [Finished] 7198676a ursa-i9-9960x [Finished] 7198676a ursa-thinkcentre-m75q Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot avatar Nov 23 '22 12:11 ursabot