gis-tools-for-hadoop icon indicating copy to clipboard operation
gis-tools-for-hadoop copied to clipboard

ST_Bin does not accept decimal numbers as first argument

Open doublebyte1 opened this issue 6 years ago • 18 comments

I am trying to run the tutorial on this page. However, this instruction throws an error:

FROM (SELECT ST_Bin(0.001, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins SELECT ST_BinEnvelope(0.001, bin_id) shape, COUNT(*) count GROUP BY bin_id;

FAILED: SemanticException [Error 10014]: Line 1:13 Wrong arguments 'dropoff_latitude': Argument 0 must be a number

It seems that it is throwing an error on the initialization of the object, but I cannot understand why.

Also tried replacin "0.001" by ".001", and got the same error. Using an integer ("1"), or the result of a division ("1/1000") works, but it gives me unexpected results. For instance, when I use:

FROM (SELECT ST_Bin(1/1000, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins SELECT ST_BinEnvelope(1/1000, bin_id) shape, COUNT(*) count GROUP BY bin_id;

The coordinates of the resulting shape are invalid lat, long (EPSG:4326):

{"type":"Polygon","coordinates":[[[-0.20249999989755452,3210.3374999998377],[-0.20149999989755452,3210.3374999998377],[-0.20149999989755452,3210.338499999838],[-0.20249999989755452,3210.338499999838],[-0.20249999989755452,3210.3374999998377]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-40.75249999994412,1727.0164999998417],[-40.75149999994412,1727.0164999998417],[-40.75149999994412,1727.0174999998417],[-40.75249999994412,1727.0174999998417],[-40.75249999994412,1727.0164999998417]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-40.719499999890104,1651.5524999999218],[-40.718499999890106,1651.5524999999218],[-40.718499999890106,1651.5534999999218],[-40.719499999890104,1651.5534999999218],[-40.719499999890104,1651.5524999999218]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-73.98149999999441,400.7774999997821],[-73.98049999999441,400.7774999997821],[-73.98049999999441,400.77849999978207],[-73.98149999999441,400.77849999978207],[-73.98149999999441,400.7774999997821]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,74.0134999998156],[0.0005000001508742571,74.0134999998156],[0.0005000001508742571,74.0144999998156],[-0.0004999998491257429,74.0144999998156],[-0.0004999998491257429,74.0134999998156]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,73.98749999997764],[0.0005000001508742571,73.98749999997764],[0.0005000001508742571,73.98849999997765],[-0.0004999998491257429,73.98849999997765],[-0.0004999998491257429,73.98749999997764]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.10349999996833503,73.98249999985657],[-0.10249999996833503,73.98249999985657],[-0.10249999996833503,73.98349999985658],[-0.10349999996833503,73.98349999985658],[-0.10349999996833503,73.98249999985657]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,73.97849999989941],[0.0005000001508742571,73.97849999989941],[0.0005000001508742571,73.97949999989942],[-0.0004999998491257429,73.97949999989942],[-0.0004999998491257429,73.97849999989941]]],"crs":null} 3.0 {"type":"Polygon","coordinates":[[[-0.4684999999590218,73.97449999994225],[-0.4674999999590218,73.97449999994225],[-0.4674999999590218,73.97549999994226],[-0.4684999999590218,73.97549999994226],[-0.4684999999590218,73.97449999994225]]],"crs":null} 1.0 {"type":"Polygon","coordinates":[[[-0.0004999998491257429,73.96049999997578],[0.0005000001508742571,73.96049999997578],[0.0005000001508742571,73.96149999997579],[-0.0004999998491257429,73.96149999997579],[-0.0004999998491257429,73.96049999997578]]],"crs":null} 3.0 Version 2.0.0

doublebyte1 avatar May 17 '19 10:05 doublebyte1

Thanks for reporting this. I assume "version 2.0.0" refers to Spatial Framework for Hadoop. Please let us know the versions of Hive and Hadoop.

randallwhitman avatar May 17 '19 15:05 randallwhitman

@randallwhitman Hadoop 2.8.5, Hive 2.3.4

doublebyte1 avatar May 17 '19 16:05 doublebyte1

Thanks for the details. We do not have Hive-2.3.4 (nor Hadoop-2.8.5) installed, and unfortunately the testing framework has fallen behind - for which I went ahead and filed Esri/spatial-framework-for-hadoop#163. Maybe it will reproduce with another version of Hive or with SparkSql.

randallwhitman avatar May 17 '19 17:05 randallwhitman

Ok. If it does not reproduce, let me know what is the highest version of Hadoop/Hive you have installed!

doublebyte1 avatar May 17 '19 17:05 doublebyte1

Under Spark-SQL-2.2.0 SELECT ST_Bin(0.001, ST_Point(0,0)); output a large integer without error.

randallwhitman avatar May 20 '19 21:05 randallwhitman

I wonder if the exception has anything to do with data in a header row. Note dropoff_latitude is an argument to ST_Point, not directly to ST_Bin.

randallwhitman avatar May 20 '19 22:05 randallwhitman

@doublebyte1 Are you able to try out the query on a copy of trip_data_1.csv that omits the header row (first line), and see what happens?

randallwhitman avatar May 21 '19 15:05 randallwhitman

@randallwhitman yes, I confirm that the error reproduces on a copy of trip_data_1.csv, without a header.

doublebyte1 avatar May 22 '19 11:05 doublebyte1

Under Spark-SQL-2.2.0 SELECT ST_Bin(0.001, ST_Point(0,0)); output a large integer without error.

The output of this query on hive 2.3.4. is:

hive> SELECT ST_Bin(0.001, ST_Point(0,0)); FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': Argument 0 must be a number

doublebyte1 avatar May 22 '19 11:05 doublebyte1

Thanks for the additional details. That it reproduces with a minimal query could be very helpful - it might be possible to reproduce without a full installation.

randallwhitman avatar May 22 '19 16:05 randallwhitman

I'm curious what Hive-2.3 will do on the following:

SELECT ST_Bin(1e-3, ST_Point(0,0));
SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));

Hive-0.14/HDP-2.2.6 :

hive> SELECT ST_Bin(1e-3, ST_Point(0,0));
4611686015463124500
hive> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
4611686015463124500
hive> SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));
4611686015463124500

Spark-SQL-2.2 :

spark-sql> SELECT ST_Bin(1e-3, ST_Point(0,0));
4611686015463124500
Time taken: 0.097 seconds, Fetched 1 row(s)
spark-sql> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
Time taken: 0.296 seconds, Fetched 1 row(s)
spark-sql> SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));
4611686015463124500
Time taken: 0.085 seconds, Fetched 1 row(s)

randallwhitman avatar May 22 '19 20:05 randallwhitman

Hive 2.3:

hive> SELECT ST_Bin(1e-3, ST_Point(0,0));
4611686015463124500
hive> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': Argument 0 must be a number
hive> SELECT ST_Bin(0.001, ST_Point("Point(0 0)"));
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '"Point(0 0)"': Argument 0 must be a number
hive> 

doublebyte1 avatar May 23 '19 10:05 doublebyte1

Differing between 0.001 versus 1e-3 is really strange and makes we wonder if there's an issue with Hive's parser.

randallwhitman avatar May 23 '19 15:05 randallwhitman

And/or maybe some new ObjectInspector types have been introduced in Hive API. and missing from - https://github.com/Esri/spatial-framework-for-hadoop/blob/master/hive/src/main/java/com/esri/hadoop/hive/ST_Bin.java#L37..L44

randallwhitman avatar May 24 '19 19:05 randallwhitman

PrimitiveCategory.DECIMAL maybe? Per skimming https://hive.apache.org/javadocs/r2.3.5/api/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.PrimitiveCategory.html , https://hive.apache.org/javadocs/r2.3.5/api/org/apache/hadoop/hive/serde2/objectinspector/primitive/package-summary.html

randallwhitman avatar May 31 '19 21:05 randallwhitman

@doublebyte1 Are you able to test the linked branch Esri/spatial-framework-for-hadoop#164 ?

randallwhitman avatar Jun 03 '19 18:06 randallwhitman

@randallwhitman unfortunately, I am not working with Hadoop anymore. I am not able to build this branch.

doublebyte1 avatar Jun 03 '19 18:06 doublebyte1

Today I got Hive-2.3.5 to run, and reproduced:

hive> SELECT ST_Bin(1.0-0.999, ST_Point(0,0));
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '0': Argument 0 must be a number

randallwhitman avatar Jun 07 '19 21:06 randallwhitman