jbrockmendel comments

Results 391 comments of


                                            jbrockmendel

PERF-#4711: avoid `to_pandas` call when calling left join

Looking at this, isnt The Right Way to handle this to copy pandas' _MergeOperation code and adapt it?

PERF-#4711: avoid `to_pandas` call when calling left join

>> Looking at this, isnt The Right Way to handle this to copy pandas' _MergeOperation code and adapt it? > think so, im just not familiar with this code. will...

Add support for/Remove `dt_to_pytimedelta` method at the Query Compiler API

Is there a standard pattern for "do X on each partition and collect the new partitions as a new series"?

PERF: cache indexers

> Is the default_to_pandas change unrelated to the cached_property change? If so, I think the changes should get separate issues and PRs, even though each one is very small. sure.

STY: rename QueryCompiler.view to something more consistent

Cool. Are there any other methods that we should get while we're at it? Are there any scenarios in which these don't make copies? If so then "take" might not...

STY: rename QueryCompiler.view to something more consistent

> So I guess I would count the results as _not_ copies. I think I worded the question poorly. I meant "copy" as in "copying the underlying data, which can...

STY: rename QueryCompiler.view to something more consistent

> A typical loc or iloc will create a new pandas API object (series or dataframe), query compiler, modin frame, and partitions, which will ultimately get new references to new...

PERF: infer partition dimensions in _compute_dtypes

> This suggests to me that calculating widths is not itself the problem. Agreed. However I'm finding that 7-8% of my runtime is in _row_lengths (tentatively appears to be via...

PERF: infer partition dimensions in _compute_dtypes

Looks like the relevant cases in compute_dtypes are where `self._partitions.shape[1] > 1`. More specifically, I'm seeing cases with partition shapes (1, 13) and (1, 4) compute in a few hundredths...

PERF: infer partition dimensions in _compute_dtypes

If I disable the `run_f_on_minimally_updated_metadata` portion of _compute_dtypes (specifically, by not going through tree_reduce), the hot spot moves to Partition.to_numpy. If I use BenchmarkMode.put(True), the time taken by compute_dtypes is...