Robert (Bobby) Evans
Robert (Bobby) Evans
The could not find parent error appears to be related to the order that some columns show up into the code in question. https://github.com/rapidsai/cudf/blob/933e32ab9ad8e5057282c48129ddbd745c538967/cpp/src/io/json/json_column.cu#L657 Appears to be related to the...
Looks like you need to run it more than once. ``` import ai.rapids.cudf._ val sb = Schema.builder() sb.addColumn(DType.STRING, "key_0_0") val s = sb.build val opts = JSONOptions.builder.withKeepQuotes(true).withLines(true).withNormalizeSingleQuotes(true).withRecoverWithNull(true).withNormalizeWhitespace(true).build() val t =...
It looks like it is related to memory pooling, and possibly reading uninitialized memory. It works fine if pooling is disabled, but fails regularly if it is enabled. (at least...
I was able to cut the file down to 861262 lines (about 26 MiB) and I am still able to see errors. Will keep working on this...
I should clarify. I have not been able to make it fail with C++ yet. Just java.
Digging deeper the tokenization is returning a different set of tokens. I am not sure why yet. The data looks fine for the first part of the run, and then...
A little more info. This is only happening in java on the async allocator. Not the arena. This is all really confusing to me.
If I set the config for recover with null to false that appears to fix the problem. Recover with nulls is odd because it is updating the data inline in...
@GregoryKimball @shrshi I really would appreciate some help in understanding what the next steps should be for debugging this. I have a test case that I can repro nearly 100%...
Okay it is a race somewhere. I put in a bunch of `stream.synchronize` calls in the JSON parsing code and the problem appears to have gone away. I will try...