Kirill Suvorov comments

Results 9 comments of


                                            Kirill Suvorov

FEAT-#5394: Reduce amount of remote calls for Map operator

> @Retribution98 I see your graphs above, but I don’t really understand what the axes mean. Please label them. @anmyachev Thanks, updated it.

FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators

> @Retribution98 do you have any performance numbers? @anmyachev This case is similar to the previous PR, so we can expect the same performance. Using 112 CPU: df.count | partitions...

FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators

> It's also a good idea to add tests for the new operators, which now work a little differently. Since the logic is at a lower level, I modified the...

DOC: Add a link to Modin on the ecosystem page.

@phofl, @fjetter, could you take a look at this PR? Is it ok if the test is down? I don't quite understand why this happened?

Poor performance of df.insert and df.to_parquet

Hi @yx367563, Thanks for the details, I was able to reproduce your problem, but unfortunately I don't have a solution for you. First of all, "achieving performance as good as...

[RAY] to_parquet() fails when spilled objects reach 64gig... Also my data is just 40gig

Hi @Liquidmasl Thanks for your attention to Modin. As I see from your reproducer in issue #7359 , you are trying to work with a large DataFrame that is bigger...

BUG: [RAY] ray initialisation sets _memory and object_store_memory to the same value, leading to crashes and less flexibility

Hi @Liquidmasl Modin has these default values because it helps to achieve good performance in general. If you have a specific case and Modin's [configuration variables](https://modin.readthedocs.io/en/stable/flow/modin/config.html) don't help you, you...

to_parquet() needs option of how many files to create, or like rays implementation: num_rows_per_file

Hi @Liquidmasl The number of files in the output depends on the number of partitions in the DataFrame. If you want to customize the number of files, you need to...

BUG: outofmemory read from big file and dump to a new one

Hi @wanghaisheng Sorry, I'm not sure I understood you correctly. Modin is not responsible for the Ray parameter `RAY_memory_usage_threshold`. Your reproducer seems to be correct. Please contact Ray for more...