Array tags are immutable views, not editable ndarrays
Any particular reason for creating them as read-only views? This forces clients to write the whole array to change a single value.
The conversion from ndarray t its view is done in this line and the following one.
Speaking of arrays, any particular reason for return int.__new__(self.wrapper, super().__getitem__(index)) in this line instead of simply return self.wrapper(super().__getitem__(index)) ?
After a couple hours reading the official docs on sub-classing ndarray, I guess there are 2 different approaches to allow changing contents:
- In
Array.__new__, callsuper().__new__(cls, ...)instead ofnp.asarray(...).view(cls), as show in this example, then populate the values of the instance before returning it. I.e.: instead of a view, return a "normal" ndarray - Keep it as a view, and add a
__setitem__that invokesself.base.__setitem__, so it operates on the view's data source rather then on the (immutable) view, as (unintentionally) show in this other example
Both approaches would have to be carefully studied for eventual gotchas and corner cases, and there might have other (or better) approaches
Hey! Thanks a lot for being so active lately! I'm busy with school stuff atm but I'm looking forward to dive into all this, hopefully next week or the week after. In the meantime keep'em coming :D
No problem, take your time... I'll keep sending PRs and issues
As for this issue, take a look at the workaround currently needed to set items in an Array:
def apply_pixels(target: Map, pixels: t.Iterable[MapPixel]) -> None:
arr = target.data['colors']
# NBT Arrays might be implemented by the NBT backend as read-only ndarray views
# If so, set the pixels in a (writeable) copy, then write back preserving original type
cls = None
if isinstance(arr, np.ndarray) and not arr.flags.writeable:
cls = arr.__class__
arr = arr.copy()
for pixel in pixels:
arr[pixel[0]] = pixel[1]
if cls:
target.data['colors'] = cls(arr)
(actual code from my map-deduper project)
Yeah I remember looking into how to subclass ndarray and I think that's why I ended up with the view approach. But I agree that it's a problem. Before diving deeper into numpy stuff though, I kind of want to see what's possible with pure python lists in combination with mypyc (as mentioned #156). The array tags pretty much only use numpy for packing/unpacking and handling endianness efficiently, so there's a chance that mypyc can achieve comparable performance for this specific use-case. This would also mean that nbtlib could let go of numpy and any C extension for people using it in constrained environments by gracefully falling back to the pure python source used by mypyc.
Good idea. If you go this route, use the Python's native bytearray for.. well, ByteArray ;-)
And maybe a simple Struct on bytearrays could also efficiently cover the case for Int and Long Arrays
Originally byte array tags used the builtin bytearray but it was a bit slower than numpy due to the separate byteswap required when dealing with big-endian nbt. Also it was a bit more confusing to maintain because the int array tags needed to be implemented differently. But yeah I'm gonna revisit all that and benchmark.
Btw, I'm perfectly fine with ndarray for all Arrays, as long as:
- It's mutable like any list
- It inherits from
MutableSequence, likelist. I think it currently doesn't. Which is convenient for me, as currently I always want to handle them separately anyway (in pretty prints and NBT Explorer-like trees), but it's also wrong. Hey, those poor arrays are mutable sequences after all. They're just fixed length. If I want to segregate them, It's my prejudice on them for being so damn long when printing :-)
Wait... no. Being fixed length might change everything. They lack .append() and .extend(). Are those in MutableSequence's API? Anyway, let the ABC be dictated by whatever the raw NBT data qualifies for. And the underlying structure should reflect that ABC, be it a wrapper on ndarray or something else.
Speaking of arrays, any particular reason for
return int.__new__(self.wrapper, super().__getitem__(index))in this line instead of simplyreturn self.wrapper(super().__getitem__(index))?
i did some searching. and i found this. https://github.com/vberlier/nbtlib/pull/38#discussion_r316645869
it use int.__new__ because wrapper(x) would check if x OutOfRange or not.
(we dont need to test if the number get from the Array is in the range)