pandas2 icon indicating copy to clipboard operation
pandas2 copied to clipboard

Copy on write for views

Open wesm opened this issue 8 years ago • 6 comments

I will work on a full document for this to get the conversation started, but this can be the placeholder for our discussion about COW

wesm avatar Sep 01 '16 13:09 wesm

As nice as copy-on-write would be, it's not strictly necessary in pandas 2.0 because we can choose our own consistent rules for copying once we divorce our storage from NumPy.

For example, we could say:

  1. Any indexing operation on columns uses views.
  2. Any indexing operation on rows makes a copy (all indexing operations on Series make a copy).

Given that we plan to ditch the BlockManager anyways, we would get (1) basically for free.

I'm sure there are a few use cases for view based slicing of DataFrame rows, but these are quite niche in comparison to selecting columns, and in my opinion, the unpredictability it introduces into the data model is not worth the trouble.

Copy on write for column views (and eventually, maybe row slicing) would still be nice in making pandas more intuitive, but could possibly wait until a later 2.1 or 3.0 release (supposing we're doing semantic versioning).

shoyer avatar Sep 01 '16 16:09 shoyer

I agree COW isn't a strict necessity for the 1 -> 2 transition. I think it's worth keeping in mind during the development process as there's a number of things we can do to make adding it later easier or more difficult. Step 1 is keeping track of parent-child relationships in a lightweight way, and we can permit mutation to start in accordance with current behavior

wesm avatar Sep 01 '16 17:09 wesm

See discussion in https://github.com/pydata/pandas/pull/11500

wesm avatar Sep 05 '16 14:09 wesm

I've expressed my views on COW in pretty extensive detail elsewhere (#10954), so I'll save everyone the trouble of repeating them all here, but in short: any behavior that's consistent and easy to understand is fine by me!

Have we abandoned trying to get this in before v1.0?

nickeubank avatar Sep 06 '16 18:09 nickeubank

It's probably not too likely, since it would be an API change that would take a little time to fully understand the impact. If anyone has other thoughts (separate from the behavior of C-O-W) on this please chime in

wesm avatar Sep 06 '16 20:09 wesm

A notable benefit of copy-on-write is that operations like reset_index become zero-copy operations.

wesm avatar Sep 19 '16 20:09 wesm