Iaroslav Igoshev

Results 189 comments of Iaroslav Igoshev

@rkooo567, thanks for commenting this! I am also thinking that it might be related to object spilling. Yes, the line `train_value = train.values` materializes the data, which is most likely...

@xixibaobei, maybe you could avoid materializing that amount of data? You could try to first convert a Modin DataFrame to a pandas DataFrame and then to a numpy array though...

@xixibaobei, to narrow down the problem let's assume you are running read_parquet operation only. Which warnings do you see during the execution? How much time does it take to read...

@xixibaobei, can you share a log file that gets generated or tell us which messages do you see there?

I guess ray_spilled_objects dir contains the data spilled by Ray to disk. @rkooo567, please correct me if am wrong. This is very interesting why so many objects get spilled during...

@xixibaobei, thanks for providing this info! 1T should be enough to read the data. This is very interesting why Ray starts spilling objects to disk in this case. The issue...

@cometta, if your code is as follows, please file an issue on Ray GitHub. It looks like a problem with connection to [GCS](https://docs.ray.io/en/latest/ray-core/fault_tolerance/gcs.html). ```python import ray ray.init() ray_ds = ray.data.read_parquet()...

> does that mean that is a bug in modin pd.read_parquet('data.parquet') ? It looks like a bug in read_parquet on Modin side, which causes OOM error. > what about the...

We also have a bunch of other methods like from_\*/to_\* we deprecated. Can we remove them in one go?