arrow
arrow copied to clipboard
ARROW-17610: [C++] Support additional source types in SourceNode
See https://issues.apache.org/jira/browse/ARROW-17610
cc @westonpace, @pitrou
https://issues.apache.org/jira/browse/ARROW-17610
@westonpace, any idea how to fix the Windows linker error here? In the past I was able to fix this by adding inline but this time it didn't work.
@westonpace, any idea how to fix the Windows linker error here? In the past I was able to fix this by adding
inlinebut this time it didn't work.
I suspect this is due to adding Executor, which is not a public symbol, in a constructor parameter as you requested. If so and we insist on exposing the IO executor, I see a couple of options to consider:
- Make
Executora public symbol. I'm not sure what issues that might lead to. - Make
Executorderive from a new type which would be a public symbol, and use this type in the constructor. This would lead to a not-so-pretty dynamic cast internally. - Expose one or more different (public-symbol) parameters that are used to internally setup the IO executor. This restricts the kind of IO executors that the caller may set up.
Executor is exported and it is included in other places that are part of the public API (e.g. SourceNodeOptions, CSV reader).
I think the problem is that SchemaSourceNodeOptions is templated. Can you try adding something like (note, the first two lines already exist in your PR)...
using ArrayVectorIteratorMaker = std::function<Iterator<std::shared_ptr<ArrayVector>>()>;
using ArrayVectorSourceNodeOptions = SchemaSourceNodeOptions<ArrayVectorIteratorMaker>;
template class ARROW_EXPORT ArrayVectorSourceNodeOptions; // Explicitly export the specialization
It turns out template instantiations and attributes (in this case, visibility) do not work together nicely. When I try adding code like template<> class ARROW_EXPORT ArrayVectorSourceNodeOptions, I get compiler errors like:
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/options.h:113:31: error: using typedef-name ‘using ArrayVectorSourceNodeOptions = class arrow::compute::SchemaSourceNodeOptions<std::function<arrow::Iterator<std::shared_ptr<std::vector<std::shared_ptr<arrow::Array> > > >()> >’ after ‘class’
113 | template<> class ARROW_EXPORT ArrayVectorSourceNodeOptions;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
and when I try adding code like template<> class ARROW_EXPORT SchemaSourceNodeOptions<ArrayVectorIteratorMaker>, I get compiler warnings like:
/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/exec/options.h:113:31: warning: attributes ignored on template instantiation [-Wattributes]
113 | template<> class ARROW_EXPORT SchemaSourceNodeOptions<ArrayVectorIteratorMaker>;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, both tries do not work, at least with my compiler. There seems to be a proposal documenting this limitation, but I'm not sure what its status is.
In the commit I just made, I opted instead to explicitly define and export non-template symbols.
The CI failures look unrelated to the PR.
@westonpace, to make sure we're on the same page, what remains for this PR to go through?
@westonpace, is it a good time to get back to this PR?
Yes, apologies. Since it's been a month lets give it one last CI pass. I've rebased this (all tests pass locally) and will merge later today.
Benchmark runs are scheduled for baseline = 7198676ac4d69f0a10bc750647cab1f7fd12a7db and contender = c9293039b5454c99e8a34f8fc3a4602d74874114. c9293039b5454c99e8a34f8fc3a4602d74874114 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Finished :arrow_down:0.57% :arrow_up:0.0%] test-mac-arm
[Finished :arrow_down:0.0% :arrow_up:0.0%] ursa-i9-9960x
[Finished :arrow_down:0.38% :arrow_up:0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] c9293039 ec2-t3-xlarge-us-east-2
[Finished] c9293039 test-mac-arm
[Finished] c9293039 ursa-i9-9960x
[Finished] c9293039 ursa-thinkcentre-m75q
[Finished] 7198676a ec2-t3-xlarge-us-east-2
[Finished] 7198676a test-mac-arm
[Finished] 7198676a ursa-i9-9960x
[Finished] 7198676a ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java