Iaroslav Igoshev
Iaroslav Igoshev
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@rkooo567, thanks for commenting this! I am also thinking that it might be related to object spilling. Yes, the line `train_value = train.values` materializes the data, which is most likely...
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@xixibaobei, maybe you could avoid materializing that amount of data? You could try to first convert a Modin DataFrame to a pandas DataFrame and then to a numpy array though...
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@rkooo567, btw, is it possible to fully disable Ray logs?
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@xixibaobei, to narrow down the problem let's assume you are running read_parquet operation only. Which warnings do you see during the execution? How much time does it take to read...
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@xixibaobei, can you share a log file that gets generated or tell us which messages do you see there?
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
I guess ray_spilled_objects dir contains the data spilled by Ray to disk. @rkooo567, please correct me if am wrong. This is very interesting why so many objects get spilled during...
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@xixibaobei, thanks for providing this info! 1T should be enough to read the data. This is very interesting why Ray starts spilling objects to disk in this case. The issue...
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
@cometta, if your code is as follows, please file an issue on Ray GitHub. It looks like a problem with connection to [GCS](https://docs.ray.io/en/latest/ray-core/fault_tolerance/gcs.html). ```python import ray ray.init() ray_ds = ray.data.read_parquet()...
BUG: Excessive log file generation when using Modin[ray] with Parquet files and DataFrame operations
> does that mean that is a bug in modin pd.read_parquet('data.parquet') ? It looks like a bug in read_parquet on Modin side, which causes OOM error. > what about the...
We also have a bunch of other methods like from_\*/to_\* we deprecated. Can we remove them in one go?