Rocco Cammisola

Results 3 comments of Rocco Cammisola

Thanks for the quick response. Apologies I haven't been able to get information back to you faster but I can't disclose the real parquet schema I'm using so I'm going...

Here's the Spark schema: ``` root |-- current_phase: integer (nullable = true) |-- bx: decimal(10,2) (nullable = true) |-- by: decimal(10,2) (nullable = true) |-- bz: decimal(10,2) (nullable = true)...

on my reduced data set (approx 200k rows) start_id was the only column that caused problems but id_in_phase caused a lot of problems on the full data (4 million rows)