incubator-livy
incubator-livy copied to clipboard
[LIVY-771][THRIFT] Do not remove trailing zeros from decimal values.
What changes were proposed in this pull request?
The decimal SQL type is mapped to java.math.BigDecimal
. Livy removes trailing zeros from the BigDecimal
before storing its string representation in a ColumnBuffer
. This causes loss of information in queries of tables with columns of decimal(precision, scale) type. In particular, if you connect to the Livy Thrift server using JDBC, execute a query and get a java.sql.ResultSet
, then ResultSet#getBigDecimal
on the index of a column of decimal type can return a BigDecimal
value of incorrect scale.
In this change, we simply do not remove trailing zeros from the BigDecimal
.
How was this patch tested?
Added cases to integration tests, which fail without this change and pass with it.
Fixed ColumnBufferTest
to account for the change in behavior.
Also tested manually by running beeline against a Livy Thrift server with this change.
CI completed and passed. I don't know why the status did not get updated here. @mgaido91 @jerryshao can you please review?
I am not sure about this change, may you please elaborate more on the previous behavior and why it is problematic? Maybe an example or some screenshots would help clarifying. Thanks.
@mgaido I explained the issue in the description of https://issues.apache.org/jira/browse/LIVY-771. I also just called out in comments on the code what the problem is. To repeat myself, if you have a column in a table of decimal(5, 2) type, e.g., then you should have values that have 2 places after the decimal point. Livy TS is returning values that do not conform to that type. It returns values such as 123.4 instead of 123.40, or 123 instead of 123.00, or 1.2E+2 instead of 120.00.
beeline to HS2:
> select f_decimal from hive_types_test order by f_int limit 1;
+------------+
| f_decimal |
+------------+
| 9.40 |
+------------+
1 row selected (14.31 seconds)
beeline to Livy TS:
> select f_decimal from hive_types_test order by f_int limit 1;
+------------+
| f_decimal |
+------------+
| 9.4 |
+------------+
1 row selected (11.544 seconds)
spark-shell:
scala> val df = spark.sql("select f_decimal from hive_types_test order by f_int")
df: org.apache.spark.sql.DataFrame = [f_decimal: decimal(5,2)]
scala> val row = df.first()
row: org.apache.spark.sql.Row = [9.40]
scala> val elem = row.get(0)
elem: Any = 9.40
scala> elem.isInstanceOf[java.math.BigDecimal]
res2: Boolean = true
scala> elem == new java.math.BigDecimal("9.40")
res3: Boolean = true
scala> elem == new java.math.BigDecimal("9.4")
res4: Boolean = false
scala> new java.math.BigDecimal("9.40").stripTrailingZeros() == new java.math.BigDecimal("9.4")
res5: Boolean = true
thanks for the explanation @wypoon. From the PR description I was confused by the sentence " Livy removes trailing zeros ... before storing its string representation ...", so I was not sure whether this was only in the visualization phase.
Codecov Report
Merging #296 into master will increase coverage by
0.01%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #296 +/- ##
============================================
+ Coverage 68.24% 68.26% +0.01%
- Complexity 965 966 +1
============================================
Files 104 104
Lines 5952 5952
Branches 900 900
============================================
+ Hits 4062 4063 +1
Misses 1311 1311
+ Partials 579 578 -1
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
...c/main/scala/org/apache/livy/repl/ReplDriver.scala | 30.76% <0.00%> (-2.57%) |
7.00% <0.00%> (ø%) |
|
...in/java/org/apache/livy/rsc/driver/JobWrapper.java | 88.57% <0.00%> (+5.71%) |
9.00% <0.00%> (+1.00%) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3b9bbef...623ef5e. Read the comment docs.
I updated the Jira as well as the PR description to point out that the BigDecimal values returned in JDBC can be incorrect due to this bug.
Hi @mgaido91 , I have responded to your point about negative scale in the decimal type in Spark SQL, and addressed the nit about the comment in the ColumnBufferTest. I have also clarified the Jira description and the PR description. This is a simple but important fix. Can we move forward now?
@mgaido91 I think this is an important bug, and the fix is straightforward. I think I have provided an adequate test. I do not plan to spend any more time on this. I think it would be good for Livy to have the fix merged. However, you or anyone else can feel free to take this over if you don't agree.