Robert (Bobby) Evans
Robert (Bobby) Evans
> Would it make sense to have RapidsMeta have the ability for expressions to be disabled for AST individually if spark.rapids.sql.expression.ast._classname_ is false? We probably still want an overall AST...
It looks like this might be related to how Spark/Python interprets the string 'a\x85'. ``` import pyspark.sql.types df = spark.createDataFrame(SparkContext.getOrCreate().parallelize([("a\x85")]), pyspark.sql.types.StringType()) spark.conf.set("spark.rapids.sql.enabled", False) df.selectExpr('CAST(value as BINARY)').show() +----------+ | value| +----------+...
the CUDF issue is https://github.com/rapidsai/cudf/issues/12228
@vyasr It stalled because of other priorities, and this is not technically a requirement. I'll probably get back to it once all of the JSON work I am doing is...
Just be aware that I have hit other issues with this data too. IllegalMemoryAccess errors, a hang where it looks like we are in an infinite loop, and exceptions saying...
I am seeing two test failures around NBSP in a quoted string. I need to do some more debugging to see if it is my code changes or yours that...
Sorry it took me so long to respond to this. I would want the white space removal to follow what CUDF already does for validation with where it ignored white...
Thanks for looking into this @sleeepyjack A sort based join as a fallback sounds like a great option. I do have a few questions about your proposal. I am not...
Thanks @sleeepyjack for the detailed explanation. In Spark we do a count aggregation on a build table before doing a join. (In Spark the build table will not always match...
Added back in needs triage because if we really need to understand what is happening. If we cannot do something simple with DB like this it is either a bug...