calcite-avatica
calcite-avatica copied to clipboard
[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server
This branch is a work in progress to show how Apache Spark and Avatica don't seem to be playing along nicely together. Spark JDBC against Avatica returns an empty result even though it determines the correct schema.
I'm expecting the build to fail with 0 rows being returned when there is actually a row there.
Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 7.373 sec <<< FAILURE! - in org.apache.calcite.avatica.remote.SparkClientTest
testSpark[JSON](org.apache.calcite.avatica.remote.SparkClientTest) Time elapsed: 6.864 sec <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108)
testSpark[PROTOBUF](org.apache.calcite.avatica.remote.SparkClientTest) Time elapsed: 0.48 sec <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108)
@joshelser - Any ideas on where to look for this? I'm surprised that there are no errors but 0 rows returned.
That is strange, @risdenk. I'd start with turning on trace logging for the Avatica protocol (see CALCITE-1879 for details). I'd probably take your simple example with Spark here and compare it to the same thing without Spark.
One thing to double-check before you try that though... you are using the avatica shaded client jar? org.apache.calcite.avatica:avatica as opposed to org.apache.calcite.avatica:avatica-core.
Thanks @joshelser I'll take a look at the trace output.
I tried the shaded client jar with "real" Spark outside the testing framework. Inside the testing framework, not 100% sure what jar is being picked up right now.
Trace output from SELECT 1 FROM test over standard JDBC (this is to match what spark is doing for .count())
2018-03-11 09:28:44,690 [qtp1769193365-18 - /] TRACE - request: {"request":"prepareAndExecute","connectionId":"74d1ae39-7f4b-49f8-9c84-3a9c13e686f5","statementId":0,"sql":"SELECT 1 from test","maxRowsInFirstFrame":-1,"maxRowCount":-1}
2018-03-11 09:28:44,692 [qtp1769193365-18 - /] TRACE - prepAndExec statement 74d1ae39-7f4b-49f8-9c84-3a9c13e686f5::0
2018-03-11 09:28:44,694 [qtp1769193365-18 - /] TRACE - response: Response: {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"},"results":[{"response":"resultSet","connectionId":"74d1ae39-7f4b-49f8-9c84-3a9c13e686f5","statementId":0,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},"firstFrame":{"offset":0,"done":true,"rows":[[1]]},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}]}, Status:200
Trace output from Spark .count() on the test table
2018-03-11 09:28:53,471 [qtp1769193365-19 - /] TRACE - request: {"request":"prepare","connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","sql":"SELECT 1 FROM test ","maxRowCount":-1}
2018-03-11 09:28:53,473 [qtp1769193365-19 - /] TRACE - prepared statement 65d00d7e-85f3-4edc-b6cb-23d8c0af21e3::2
2018-03-11 09:28:53,474 [qtp1769193365-19 - /] TRACE - response: Response: {"response":"prepare","statement":{"connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","id":2,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}, Status:200
...
2018-03-11 09:28:53,486 [qtp1769193365-17 - /] TRACE - request: {"request":"execute","statementHandle":{"connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","id":2,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"NUMBER"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"parameterValues":[],"maxRowCount":0}
2018-03-11 09:28:53,488 [qtp1769193365-17 - /] TRACE - response: Response: {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"},"results":[{"response":"resultSet","connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","statementId":2,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":"SELECT"},"firstFrame":{"offset":0,"done":true,"rows":[]},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}]}, Status:200
The big difference is that one is a prepareAndExecute and the other is a create prepared and then execute separately. The results are also different for the first frame:
"firstFrame":{"offset":0,"done":true,"rows":[[1]]} # as part of standard JDBC
"firstFrame":{"offset":0,"done":true,"rows":[]} # as part of Spark .count()
Still digging in further. Will try to reproduce the create prepared and separate execute next with standard JDBC.
So it looks like Spark is setting fetch size to be 0 on the prepared statement. This is causing the maxRows to get set to 0 during the request.
For standard JDBC when the execute request is made: "maxRowCount":100} which results in at least 100 rows.
For Spark when the execute request is made: "maxRowCount":0} which results in 0 rows.
I can reproduce this by setFetchSize(0) on the PreparedStatement before executing it. Using setMaxRows(n) seems to have no affect. It looks like the two (fetch size and max size) are being confused somewhere in Avatica.
I'm going to push my current Spark test which shows this failure without Spark even being invoked.
Test failure as expected:
Failed tests:
SparkClientTest.testSpark:119
SparkClientTest.testSpark:119
This is because the call ResultSet.next() fails when there are no rows when there should be rows.
The request is incorrectly getting sent with maxRowCount:0 because setFetchSize(0) was called..
2018-03-11 15:29:11,100 [qtp1411675719-124] TRACE - request: {"request":"execute","statementHandle":{"connectionId":"c5fbb060-4de7-424f-9013-6d2465616063","id":5,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"NUMBER"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 from test","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"parameterValues":[],"maxRowCount":0}
I don't think the overall problem is related to Spark support but Spark found the issue.
I found a solution to the setFetchSize issue. However I'm not entirely sure it is correct. It looks like fetchSize, maxRowsInFirstFrame, and maxRowCount could be misused throughout Avatica. It looks like at some point maxRowsinFirstFrame was added but not all the variable names were changed? I don't think I understand the different between fetchSize and maxRowsInFirstFrame since they should be similar if not the same?
@joshelser - Any thoughts on the above? I can dig in further just going to take longer to understand how it is all put together.
Looks like the test failure is related to JDK7 and bumping the hsqldb version? This doesn't mean the workaround for maxRowsInFirstFrame I put in is correct. There seems to be some lacking test coverage around setFetchSize, setMaxRows, and maxRowsInFirstFrame.
I can reproduce this by setFetchSize(0) on the PreparedStatement before executing it.
Oof. Getting fetchSize and maxResultSize mixed up is bad.
I found a solution to the setFetchSize issue. However I'm not entirely sure it is correct. It looks like fetchSize, maxRowsInFirstFrame, and maxRowCount could be misused throughout Avatica. It looks like at some point maxRowsinFirstFrame was added but not all the variable names were changed? I don't think I understand the different between fetchSize and maxRowsInFirstFrame since they should be similar if not the same?
maxRowsInFirstFrame and maxRowCount are irritating. This was trying to fix an old bug in a backwards-compat manner. IIRC, maxRowCount is supposed to be a global "I want no more than $maxRowCount rows from this query", whereas maxRowsInFirstFrame is a protocol-specific thing that limits the number of rows we'll pack into a single ResultSetResponse message from server->client.
I don't recall how fetchSize fits into this, but it seems to be getting used in Avatica somewhere, instead of just getting set onto the "real" Statement inside of the server. (Avatica could try to do something smart with the fetchSize, but an acceptable starting point would just be to pass the value along).
Does that help? If you have a unit test you can share, that would help me get up to speed in debugging.
@joshelser - Thanks for the insight. I think that helps.
As far as a unit test, adding ps.setFetchSize(0) on Line 563 of RemoteDriverTest and adding statement.setFetchSize(0) on Line 569 of RemoteDriverTest will get the RemoteDriverTest to fail for a few of the existing tests.
Rebased on latest master to look at this again.
This PR has been inactive for quite some time now and it seems there is no interest from the authors to push this forward thus I am closing it down.