Gryllos Prokopis

Results 16 comments of Gryllos Prokopis

hi folks, is someone working on porting the crawler?

oh, it makes sense; what do you see as good alternatives for the same objective? I guess tensorflow by itself offers advanced apis for distributed training; I recently also heard...

@martindurant I am currently in a similar situation where I am trying to load a dataframe create by spark with a lot of nullable columns and I get the >...

@martindurant I can see that if I try to load specific columns with small chunks of the data it succeeds most of the time; so I assume there are sparse...

@martindurant if I try using the pyarrow engine I get an `NotImplementedError` coming from a call to `to_pandas_dtype`; looks like this > NotImplementedError: struct

oops :/ didn't realise that. So there is no way to read in nested structures? Unfortunately restructuring spark is not an option. The schema is fairly big and with a...

the doc here seems to state that fastparquet can read nested schemas https://fastparquet.readthedocs.io/en/latest/details.html#reading-nested-schema

@martindurant it does actually! still a bit confused about what exactly this means. I am keen on putting in a little work myself to make our ingestion work with Dask;...

how do you propose I go from here? makes sense to investigate and open a pr? btw big thanks for taking the time :)

@martindurant what I understand is happening is that for every column this loop checks whether the column may have null values based on heuristic checks on meta_data and if it...