incubator-livy icon indicating copy to clipboard operation
incubator-livy copied to clipboard

[LIVY-771][THRIFT] Do not remove trailing zeros from decimal values.

Open wypoon opened this issue 4 years ago • 8 comments

What changes were proposed in this pull request?

The decimal SQL type is mapped to java.math.BigDecimal. Livy removes trailing zeros from the BigDecimal before storing its string representation in a ColumnBuffer. This causes loss of information in queries of tables with columns of decimal(precision, scale) type. In particular, if you connect to the Livy Thrift server using JDBC, execute a query and get a java.sql.ResultSet, then ResultSet#getBigDecimal on the index of a column of decimal type can return a BigDecimal value of incorrect scale. In this change, we simply do not remove trailing zeros from the BigDecimal.

How was this patch tested?

Added cases to integration tests, which fail without this change and pass with it. Fixed ColumnBufferTest to account for the change in behavior. Also tested manually by running beeline against a Livy Thrift server with this change.

wypoon avatar May 20 '20 20:05 wypoon

CI completed and passed. I don't know why the status did not get updated here. @mgaido91 @jerryshao can you please review?

wypoon avatar May 21 '20 17:05 wypoon

I am not sure about this change, may you please elaborate more on the previous behavior and why it is problematic? Maybe an example or some screenshots would help clarifying. Thanks.

mgaido91 avatar May 21 '20 19:05 mgaido91

@mgaido I explained the issue in the description of https://issues.apache.org/jira/browse/LIVY-771. I also just called out in comments on the code what the problem is. To repeat myself, if you have a column in a table of decimal(5, 2) type, e.g., then you should have values that have 2 places after the decimal point. Livy TS is returning values that do not conform to that type. It returns values such as 123.4 instead of 123.40, or 123 instead of 123.00, or 1.2E+2 instead of 120.00.

beeline to HS2:

> select f_decimal from hive_types_test order by f_int limit 1;
+------------+
| f_decimal  |
+------------+
| 9.40       |
+------------+
1 row selected (14.31 seconds)

beeline to Livy TS:

> select f_decimal from hive_types_test order by f_int limit 1;
+------------+
| f_decimal  |
+------------+
| 9.4        |
+------------+
1 row selected (11.544 seconds)

spark-shell:

scala> val df = spark.sql("select f_decimal from hive_types_test order by f_int")
df: org.apache.spark.sql.DataFrame = [f_decimal: decimal(5,2)]

scala> val row = df.first()
row: org.apache.spark.sql.Row = [9.40]                                          

scala> val elem = row.get(0)
elem: Any = 9.40

scala> elem.isInstanceOf[java.math.BigDecimal]
res2: Boolean = true

scala> elem == new java.math.BigDecimal("9.40")
res3: Boolean = true

scala> elem == new java.math.BigDecimal("9.4")
res4: Boolean = false

scala> new java.math.BigDecimal("9.40").stripTrailingZeros() == new java.math.BigDecimal("9.4")
res5: Boolean = true

wypoon avatar May 21 '20 21:05 wypoon

thanks for the explanation @wypoon. From the PR description I was confused by the sentence " Livy removes trailing zeros ... before storing its string representation ...", so I was not sure whether this was only in the visualization phase.

mgaido91 avatar May 22 '20 14:05 mgaido91

Codecov Report

Merging #296 into master will increase coverage by 0.01%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #296      +/-   ##
============================================
+ Coverage     68.24%   68.26%   +0.01%     
- Complexity      965      966       +1     
============================================
  Files           104      104              
  Lines          5952     5952              
  Branches        900      900              
============================================
+ Hits           4062     4063       +1     
  Misses         1311     1311              
+ Partials        579      578       -1     
Impacted Files Coverage Δ Complexity Δ
...c/main/scala/org/apache/livy/repl/ReplDriver.scala 30.76% <0.00%> (-2.57%) 7.00% <0.00%> (ø%)
...in/java/org/apache/livy/rsc/driver/JobWrapper.java 88.57% <0.00%> (+5.71%) 9.00% <0.00%> (+1.00%)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3b9bbef...623ef5e. Read the comment docs.

codecov-commenter avatar May 23 '20 01:05 codecov-commenter

I updated the Jira as well as the PR description to point out that the BigDecimal values returned in JDBC can be incorrect due to this bug.

wypoon avatar May 23 '20 22:05 wypoon

Hi @mgaido91 , I have responded to your point about negative scale in the decimal type in Spark SQL, and addressed the nit about the comment in the ColumnBufferTest. I have also clarified the Jira description and the PR description. This is a simple but important fix. Can we move forward now?

wypoon avatar Jun 01 '20 18:06 wypoon

@mgaido91 I think this is an important bug, and the fix is straightforward. I think I have provided an adequate test. I do not plan to spend any more time on this. I think it would be good for Livy to have the fix merged. However, you or anyone else can feel free to take this over if you don't agree.

wypoon avatar Oct 29 '20 20:10 wypoon