pathling
pathling copied to clipboard
Support datetime literals with less then seconds precision
Currently due to FHIR datetime parser limitations DateTime literals with less then second precision fail to parse and silently produce null in udf based operations.
Details
Expected: false but got: null ==> expected: <false> but was: <null>
@2019-02-03T02:00Z = @2019-02-02T21:00-04:00 [null]
Expected: true but got: null ==> expected: <true> but was: <null>
@2019-02-03T01:00Z = @2019-02-02T21:00-04:00 [null]
(Equality exclusions)
Details
Expected: false but got: null ==> expected: <false> but was: <null>
@2018-12-21T12:01 > @2018-12-21T12:01 [Same date, same time]@2018-12-21T12:01 > @2018-12-21T12:02 [Same date, different time (1)]@2018-12-20T12:02 > @2018-12-21T12:01 [Different dates & times (1)]@2018-12-21T12:02+03:00 > @2018-12-21T12:01+02:00 [Time zone differences (1)]@2018-02-02T11 = @2018-02-02T12 [DateTime literal compared with different DateTime (hour)]@2018-02-02T22-04:00 = @2018-02-03T05:03+04:00 [DateTime literal compared with different precision DateTime (hour, tz diff)]@2018-02-02T11+04:00 = @2018-02-02T12+04:00 [DateTime literal compared with different DateTime (hour,tz)]@2018-02-02T11:01 = @2018-02-02T11:02 [DateTime literal compared with different DateTime (minute)]@2018-02-02T11:01+04:00 = @2018-02-02T11:02+04:00 [DateTime literal compared with different DateTime (minute, tz)]
Expected: true but got: null ==> expected: <true> but was: <null>
@2018-12-21T12:02 > @2018-12-21T12:01 [Same date, different time (2)]@2018-12-22T12:02 > @2018-12-21T12:01 [Different dates & times (1)]@2018-12-20T12 > @2018-12-20T11:01 [Comparison with differnet precision (2)]@2018-12-21T12:02+02:00 > @2018-12-21T12:01+03:00 [Time zone differences (2)]@2018-02-02T11 = @2018-02-02T11 [DateTime literal compared with DateTime (hour)]@2018-02-02T11+04:00 = @2018-02-02T11+04:00 [DateTime literal compared with DateTime (hour, tz)]@2018-02-02T22-04:00 = @2018-02-03T06+04:00 [DateTime literal compared with same DateTime (hour, tz different)]@2018-02-02T11:01 = @2018-02-02T11:01 [DateTime literal compared with DateTime (minute)]@2018-02-02T11:01+04:00 = @2018-02-02T11:01+04:00 [DateTime literal compared with DateTime (minute, tz)]
org.apache.spark.SparkException: Job aborted due to stage failure: ?? (executor driver): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (`datetime_add_duration (UDFRegistration$$Lambda$1739/0x0000007800a055f8)`: (string, struct<id:void,value:decimal(32,6),value_scale:int,comparator:void,unit:void,system:string,code:string,_value_canonicalized:struct<value:decimal(38,0),scale:int>,_code_canonicalized:string,_fid:void>) => string). at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:198) at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:842) Caused by: ca.uhn.fhir.parser.DataFormatException: Invalid date/time format: "2018-02-18T12:00" at org.hl7.fhir.r4.model.BaseDateTimeType.throwBadDateFormat(BaseDateTimeType.java:797) at org.hl7.fhir.r4.model.BaseDateTimeType.validateLengthIsAtLeast(BaseDateTimeType.java:865) at org.hl7.fhir.r4.model.BaseDateTimeType.parse(BaseDateTimeType.java:468) at org.hl7.fhir.r4.model.BaseDateTimeType.parse(BaseDateTimeType.java:52) at org.hl7.fhir.r4.model.PrimitiveType.fromStringValue(PrimitiveType.java:108) at org.hl7.fhir.r4.model.PrimitiveType.setValueAsString(PrimitiveType.java:164) at org.hl7.fhir.r4.model.BaseDateTimeType.setValueAsString(BaseDateTimeType.java:741) at org.hl7.fhir.r4.model.BaseDateTimeType.<init>(BaseDateTimeType.java:99) at org.hl7.fhir.r4.model.DateTimeType.<init>(DateTimeType.java:102) at au.csiro.pathling.sql.dates.datetime.DateTimeArithmeticFunction.lambda$parseEncodedValue$0(DateTimeArithmeticFunction.java:44) at au.csiro.pathling.sql.dates.TemporalArithmeticFunction.call(TemporalArithmeticFunction.java:91) at au.csiro.pathling.sql.dates.TemporalArithmeticFunction.call(TemporalArithmeticFunction.java:39) at org.apache.spark.sql.UDFRegistration.$anonfun$register$354(UDFRegistration.scala:767) ... 20 more Driver stacktrace:
@2018-02-18T12:00 + 59 seconds = @2018-02-18T12:00 [null]@2018-02-18T12:00 + 60 seconds = @2018-02-18T12:01 [null]
org.apache.spark.SparkException: Job aborted due to stage failure: ?? (executor driver): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (`datetime_add_duration (UDFRegistration$$Lambda$1739/0x0000007800a055f8)`: (string, struct<id:void,value:decimal(32,6),value_scale:int,comparator:void,unit:void,system:string,code:string,_value_canonicalized:void,_code_canonicalized:void,_fid:void>) => string). at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:198) at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:842) Caused by: ca.uhn.fhir.parser.DataFormatException: Invalid date/time format: "2018-02-18T12" at org.hl7.fhir.r4.model.BaseDateTimeType.throwBadDateFormat(BaseDateTimeType.java:797) at org.hl7.fhir.r4.model.BaseDateTimeType.validateLengthIsAtLeast(BaseDateTimeType.java:865) at org.hl7.fhir.r4.model.BaseDateTimeType.parse(BaseDateTimeType.java:468) at org.hl7.fhir.r4.model.BaseDateTimeType.parse(BaseDateTimeType.java:52) at org.hl7.fhir.r4.model.PrimitiveType.fromStringValue(PrimitiveType.java:108) at org.hl7.fhir.r4.model.PrimitiveType.setValueAsString(PrimitiveType.java:164) at org.hl7.fhir.r4.model.BaseDateTimeType.setValueAsString(BaseDateTimeType.java:741) at org.hl7.fhir.r4.model.BaseDateTimeType.<init>(BaseDateTimeType.java:99) at org.hl7.fhir.r4.model.DateTimeType.<init>(DateTimeType.java:102) at au.csiro.pathling.sql.dates.datetime.DateTimeArithmeticFunction.lambda$parseEncodedValue$0(DateTimeArithmeticFunction.java:44) at au.csiro.pathling.sql.dates.TemporalArithmeticFunction.call(TemporalArithmeticFunction.java:91) at au.csiro.pathling.sql.dates.TemporalArithmeticFunction.call(TemporalArithmeticFunction.java:39) at org.apache.spark.sql.UDFRegistration.$anonfun$register$354(UDFRegistration.scala:767) ... 20 more Driver stacktrace:
@2018-02-18T12 + 59 minutes = @2018-02-18T12 [null]@2018-02-18T12 + 60 minutes = @2018-02-18T13 [null]