cpython icon indicating copy to clipboard operation
cpython copied to clipboard

memoryview: add multi-dimensional indexing and slicing

Open 5531d0d8-2a9c-46ba-8b8b-ef76132a492c opened this issue 13 years ago • 10 comments

BPO 14130
Nosy @ncoghlan, @abalkin, @pitrou, @pv, @skrah

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-02-26.12:20:55.331>
labels = ['interpreter-core', 'type-feature']
title = 'memoryview: add multi-dimensional indexing and slicing'
updated_at = <Date 2014-10-14.17:31:14.047>
user = 'https://github.com/skrah'

bugs.python.org fields:

activity = <Date 2014-10-14.17:31:14.047>
actor = 'skrah'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2012-02-26.12:20:55.331>
creator = 'skrah'
dependencies = []
files = []
hgrepos = []
issue_num = 14130
keywords = []
message_count = 9.0
messages = ['154336', '194072', '196523', '210583', '210597', '210598', '210660', '210883', '210918']
nosy_count = 8.0
nosy_names = ['teoliphant', 'ncoghlan', 'belopolsky', 'pitrou', 'pv', 'undercoveridiot', 'skrah', 'DLowell']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue14130'
versions = ['Python 3.3']

The PEP-3118 authors originally planned to have support for multi-dimensional indexing and slicing in memoryview.

Since memoryview now already has the capabilities of multi-dimensional list representations and comparisons, this would be a nice addition to the feature set.

Is this issue still being worked on?

I would probably work on it (it's basically implemented in _testbuffer.c), but I'm not sure if the NumPy community will actually use the feature.

If there is any way to get this implemented, it is needed. For one, the docs on memoryview make no mention that indexing and slicing doesn't work with multi-dimensional data which led me to believe it was supported until I tried using it. A second reason is currently this represents a loss of functionality from the buffer type in python2. In porting code using the buffer type in python2 to python3, you get a very unhelpful "NotImplementedError" with no description when trying to slice a memoryview. There is no workaround but to call tobytes() and copy the data in memory to an object that supports slicing, but for very large objects this defeats the primary purpose of using buffers in the first place, which is to avoid memory copies.

memoryview supports slicing - it just doesn't support NumPy style *multi-dimensional* slicing (and buffer doesn't support that either).

ncoghlan avatar Feb 08 '14 09:02 ncoghlan

(However, if you're on Python 3.2, then you'll likely need to upgrade to Python 3.3 - memoryview *does* have a lot of additional limitations in Python 3.2)

ncoghlan avatar Feb 08 '14 09:02 ncoghlan

Ian, could you please provide an example where multi-dimensional indexing and slicing works in 2.x but not in 3.3?

Its not multi-dimensional slicing to get a subset of objects as in Numpy, but more the ability to slice a buffer containing a multi-dimensional array as raw bytes. Buffer objects in Python2.7 are dimensionality naive so it works fine. You were correct that I was testing against Python3.2, in Python3.3 the slicing of ndim > 1 works, however only for reading from the buffer. I still can't write back into a memoryview object with ndim > 1 in Python 3.3.

Python 2.7.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> type(arr.data)
<type 'buffer'>
>>> arr.data[0:10]
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> 

Python 3.2.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> type(arr.data)
<class 'memoryview'>
>>> arr.data[0:10]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError
>>> 

Python 3.3.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> type(arr.data)
<class 'memoryview'>
>>> arr.data[0:10]
<memory at 0x7faaf1d03a48>
>>> 

However to write data back into a buffer:

Python 2.7.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> arr.data[0:10] = '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> 

Python 3.2.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> arr.data[0:10] = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError
>>> 

Python 3.3.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> arr.data[0:10] = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: memoryview assignments are currently restricted to ndim = 1
>>> 

Also the slice in Python3.3 is not the same as just returning a chunk of raw bytes from the memory buffer, instead of a bytes object the indexing behaves similar to numpy array indexes and you get the (sub) array items back as Python objects.

Python2.7.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> arr.data[0:10]
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
len(bytes(arr.data[0:10]))
10

Python3.3.3:
>>> import numpy as np
>>> arr = np.zeros(shape=(100,100))
>>> arr.data[0:10]
<memory at 0x7f109a71ea48>
>>> len(bytes(arr.data[0:10]))
8000

This is not a big deal in my case since I already have numpy arrays I can just use bytes(arr.flat[start:end]) to scan through the array contents as byte chunks, but that would not be possible with just a memoryview object like it was with the Python2 buffer object without converting it to something else or dropping to ctypes and iterating over the memory addresses and dereferencing the contents.

So in Python3.3 its halfway to the functionality in Python2.7, I can send chunks of the data through a compressed or encrypted stream, but I can't rebuild the data on the other side without first creating a bytearray and eating the cost of a copy into a memoryview. All I really need is a way to reconstruct the original memoryview buffer in memory from a stream of bytes without having to make a temporary object first and then copy its contents into the final memoryview object when it is complete.

Thanks, Ian. It seems to me that these issues should be sorted out on the NumPy lists:

memoryview is not a drop-in replacement for buffer, so it has different semantics.

What might help you is that you can cast any memoryview to simple bytes without making a copy:

memoryview.cast('B')

Hi,

I stumbled upon this as I tried the following code in Python 3.10:


b = bytes(range(256)) * 32 * 7                                     # 7 line color image with 2048 px in RGBA format - plain buffer
m = memoryview(b).cast('B',shape=[7,8192])             # Create reshaped memoryview

color_plains = {
    "r" : m[::,0::4],
    "g" : m[::,1::4],
    "b" : m[::,2::4]
}

I would have liked to extract each color plain without the need to create a copy. Is there any plan to implement multidimensional slicing for memory views? Slicing first and then reshaping does not work due to not being contiguous.

Bosi1024 avatar Nov 08 '22 15:11 Bosi1024