[BUG] GetJsonObject should return null for invalid query instead of throwing an exception
Describe the bug There are some invalid paths, like in python
f.get_json_object('jsonStr', '''$.'a''').alias('qsqa2'),
which results in the cudf code throwing an exception saying the path is invalid. But the Spark code just returns a null for an invalid path. We should do the same.
ai.rapids.cudf.CudfException: CUDF failure at:/.../json_path.cu:657: Encountered invalid JSONPath input string
at ai.rapids.cudf.ColumnView.getJSONObject(Native Method)
at ai.rapids.cudf.ColumnView.getJSONObject(ColumnView.java:2995)
or
f.get_json_object('jsonStr', 'a').alias('just_a'),
ai.rapids.cudf.CudfException: CUDF failure at:/.../json_path.cu:593: Unrecognized JSONPath operator
at ai.rapids.cudf.ColumnView.getJSONObject(Native Method)
at ai.rapids.cudf.ColumnView.getJSONObject(ColumnView.java:2995)
This should be addressed by PR 15082 for the cases where JSONPath (query) contains invalid syntax.
Cudf PR 15082 does fix the issue (returns null instead of throwing an error) in the cases where cudf thinks the query has an invalid argument. This means the example stated in the description will not result in cudf exception.
However, an attempt to handle it within the spark-rapids is being made by PR 10466 by handling it before passing to cudf. I'll let authors of PR 10466 close this issue.
@thirtiseven has this issue been verified to be closed by https://github.com/NVIDIA/spark-rapids/pull/10466 ?
cc: @revans2
@thirtiseven has this issue been verified to be closed by #10466 ?
Thanks for remind. Technically, this issue was fixed by #15082 and #10466, but the test case test_get_json_object_invalid_path still fails, so I didn't notice.
Now the case fails for a different reason:
$.'a is a valid path from Spark's check, so it is normalized to $[''a'] and passed to kernel in plugin, which results in
ai.rapids.cudf.CudfException: CUDF failure on: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-690-cuda11/thirdparty/cudf/cpp/src/json/json_path.cu:631: Invalid empty name in JSONPath query string
I'm going to move this case out of test_get_json_object_invalid_path and find/create an issue and test to place it. Before that I think we can keep this issue open for a while.
Filed issue for the above case: https://github.com/NVIDIA/spark-rapids/issues/10537
The PR #15082 doesn't address the issue with $[''a'], but it does fix the problem with $.'a.
The exception "Invalid empty name in JSONPath query string" occurs for $[''a'] both before and after #15082.
This issue remains a limitation of cudf's get_json_object function.
Now new implementation parses path in Spark-Rapids by using Spark code, so we should check the invalid path in Spark-Rapids before calling Kernel code.
Will be fixed by: https://github.com/NVIDIA/spark-rapids/pull/10581