arrow icon indicating copy to clipboard operation
arrow copied to clipboard

RFC: [C++][Java][FlightRPC] Substrait, transaction, cancellation for Flight SQL

Open lidavidm opened this issue 3 years ago • 9 comments

lidavidm avatar Jul 01 '22 18:07 lidavidm

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

github-actions[bot] avatar Jul 01 '22 18:07 github-actions[bot]

Generally seems OK to me.

emkornfield avatar Jul 19 '22 06:07 emkornfield

I've been pushing various tweaks to the spec while implementing it in C++.

One thing I will need to change: transaction IDs should be supplied when creating a prepared statement, not when executing them (since generally existing APIs associate the statement with a particular connection).

lidavidm avatar Jul 29 '22 15:07 lidavidm

There's now implementations in C++ and Java.

TODOs:

  • Integration tests in C++
  • https://issues.apache.org/jira/browse/ARROW-17254
  • Ensure integration tests work across languages
  • Fix C++ API/Proto comments (methods should accept only Transaction values)
  • Add wrapper types for new types in Java (stop using the Protobuf classes directly)
  • Implement transactions in the Derby example (depends on #13710)

lidavidm avatar Aug 02 '22 18:08 lidavidm

Hmm, Windows builds fail because of a similar issue to https://github.com/apache/arrow/pull/13434 - Protobuf and DLLs don't interact well, since you can't get protoc to insert the dllimport/dllexport declarations correctly.

The easiest thing might be to just punt on CancelQuery for now. Or else, it would have to be

message ActionCancelQueryRequest {
  // XXX(ARROW-16902): A serialized FlightInfo   
  bytes info = 1;
}

and then rely on FlightInfo::Deserialize.

lidavidm avatar Aug 05 '22 18:08 lidavidm

@lidavidm Since this is a draft, are you looking for a detailed review or more for general opinions?

pitrou avatar Aug 08 '22 13:08 pitrou

The Protobuf definitions deserve more scrutiny; for the code, I'm just looking for general opinions.

lidavidm avatar Aug 08 '22 14:08 lidavidm

I need to figure out the AppVeyor failure here, but this should generally be ready.

lidavidm avatar Aug 10 '22 20:08 lidavidm

@jduo any further comments on the spec (or implementation) here?

lidavidm avatar Aug 11 '22 17:08 lidavidm

I will fix ARROW-17254/ARROW-17420 and rebase this, but otherwise I think this is ready

lidavidm avatar Aug 15 '22 20:08 lidavidm

ARROW-17420 was merged, rebased on top of PR for ARROW-17254 (#13898) and updated

lidavidm avatar Aug 17 '22 14:08 lidavidm

@pitrou were you planning on taking a second look or were you just passing by? :slightly_smiling_face:

@zeroshade if you had any comments on the definitions before implementation, they would be appreciated!

lidavidm avatar Aug 23 '22 16:08 lidavidm

@lidavidm I don't have time for a second look, unless you really need it :-)

pitrou avatar Aug 23 '22 16:08 pitrou

No worries, just want to make sure!

lidavidm avatar Aug 23 '22 18:08 lidavidm

Updated again, since @jvanstraten pointed out that the server may want to know the client's Substrait release version since otherwise it may be unclear how to interpret the plan (even if it parses properly).

Also, adds some validation for SqlInfo values to the integration test + adds SqlInfo values so the server can report Substrait version support.

lidavidm avatar Aug 29 '22 21:08 lidavidm

(I guess these are mostly minor changes and we shouldn't need to restart the vote?)

Yes, definitely.

Also, sorry, the review contains C++ comments that I did some days/weeks ago but had forgotten to submit apparently :-S

pitrou avatar Sep 06 '22 13:09 pitrou

Rebased + updated (minus the timeout since we'd have to change the result set schema for GetSqlInfo to add floating point)

lidavidm avatar Sep 06 '22 15:09 lidavidm

https://issues.apache.org/jira/browse/ARROW-17688

github-actions[bot] avatar Sep 12 '22 17:09 github-actions[bot]

Updated, thanks Antoine!

lidavidm avatar Sep 13 '22 18:09 lidavidm

CI failures here are addressed/fixed elsewhere

lidavidm avatar Sep 16 '22 15:09 lidavidm

Benchmark runs are scheduled for baseline = d571e93ad24d5111800540b42a3b8d56459edd9b and contender = 3ce40143f8a836df058ec5fe1b29d9da5ede169d. 3ce40143f8a836df058ec5fe1b29d9da5ede169d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2 [Failed :arrow_down:0.1% :arrow_up:0.0%] test-mac-arm [Failed :arrow_down:0.28% :arrow_up:0.0%] ursa-i9-9960x [Finished :arrow_down:0.21% :arrow_up:0.0%] ursa-thinkcentre-m75q Buildkite builds: [Finished] 3ce40143 ec2-t3-xlarge-us-east-2 [Failed] 3ce40143 test-mac-arm [Failed] 3ce40143 ursa-i9-9960x [Finished] 3ce40143 ursa-thinkcentre-m75q [Finished] d571e93a ec2-t3-xlarge-us-east-2 [Failed] d571e93a test-mac-arm [Failed] d571e93a ursa-i9-9960x [Finished] d571e93a ursa-thinkcentre-m75q Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot avatar Sep 17 '22 12:09 ursabot

['Python', 'R'] benchmarks have high level of regressions. test-mac-arm ursa-i9-9960x

ursabot avatar Sep 17 '22 12:09 ursabot