nbtlib Array tags are immutable views, not editable ndarrays

Any particular reason for creating them as read-only views? This forces clients to write the whole array to change a single value.

The conversion from ndarray t its view is done in this line and the following one.

Oct 29 '21 03:10 MestreLion

Speaking of arrays, any particular reason for return int.__new__(self.wrapper, super().__getitem__(index)) in this line instead of simply return self.wrapper(super().__getitem__(index)) ?

Oct 29 '21 04:10 MestreLion

After a couple hours reading the official docs on sub-classing ndarray, I guess there are 2 different approaches to allow changing contents:

In Array.__new__, call super().__new__(cls, ...) instead of np.asarray(...).view(cls), as show in this example, then populate the values of the instance before returning it. I.e.: instead of a view, return a "normal" ndarray
Keep it as a view, and add a __setitem__ that invokes self.base.__setitem__, so it operates on the view's data source rather then on the (immutable) view, as (unintentionally) show in this other example

Both approaches would have to be carefully studied for eventual gotchas and corner cases, and there might have other (or better) approaches

Oct 29 '21 05:10 MestreLion

Hey! Thanks a lot for being so active lately! I'm busy with school stuff atm but I'm looking forward to dive into all this, hopefully next week or the week after. In the meantime keep'em coming :D

Oct 29 '21 05:10 vberlier

No problem, take your time... I'll keep sending PRs and issues

As for this issue, take a look at the workaround currently needed to set items in an Array:

def apply_pixels(target: Map, pixels: t.Iterable[MapPixel]) -> None:
    arr = target.data['colors']
    # NBT Arrays might be implemented by the NBT backend as read-only ndarray views
    # If so, set the pixels in a (writeable) copy, then write back preserving original type
    cls = None
    if isinstance(arr, np.ndarray) and not arr.flags.writeable:
        cls = arr.__class__
        arr = arr.copy()
    for pixel in pixels:
        arr[pixel[0]] = pixel[1]
    if cls:
        target.data['colors'] = cls(arr)

(actual code from my map-deduper project)

Oct 29 '21 06:10 MestreLion

Yeah I remember looking into how to subclass ndarray and I think that's why I ended up with the view approach. But I agree that it's a problem. Before diving deeper into numpy stuff though, I kind of want to see what's possible with pure python lists in combination with mypyc (as mentioned #156). The array tags pretty much only use numpy for packing/unpacking and handling endianness efficiently, so there's a chance that mypyc can achieve comparable performance for this specific use-case. This would also mean that nbtlib could let go of numpy and any C extension for people using it in constrained environments by gracefully falling back to the pure python source used by mypyc.

Nov 03 '21 10:11 vberlier

Good idea. If you go this route, use the Python's native bytearray for.. well, ByteArray ;-)

And maybe a simple Struct on bytearrays could also efficiently cover the case for Int and Long Arrays

Nov 03 '21 12:11 MestreLion

Originally byte array tags used the builtin bytearray but it was a bit slower than numpy due to the separate byteswap required when dealing with big-endian nbt. Also it was a bit more confusing to maintain because the int array tags needed to be implemented differently. But yeah I'm gonna revisit all that and benchmark.

Nov 03 '21 12:11 vberlier

Btw, I'm perfectly fine with ndarray for all Arrays, as long as:

It's mutable like any list
It inherits from MutableSequence, like list. I think it currently doesn't. Which is convenient for me, as currently I always want to handle them separately anyway (in pretty prints and NBT Explorer-like trees), but it's also wrong. Hey, those poor arrays are mutable sequences after all. They're just fixed length. If I want to segregate them, It's my prejudice on them for being so damn long when printing :-)

Wait... no. Being fixed length might change everything. They lack .append() and .extend(). Are those in MutableSequence's API? Anyway, let the ABC be dictated by whatever the raw NBT data qualifies for. And the underlying structure should reflect that ABC, be it a wrapper on ndarray or something else.

Nov 03 '21 15:11 MestreLion

Speaking of arrays, any particular reason for return int.__new__(self.wrapper, super().__getitem__(index)) in this line instead of simply return self.wrapper(super().__getitem__(index)) ?

i did some searching. and i found this. https://github.com/vberlier/nbtlib/pull/38#discussion_r316645869

it use int.__new__ because wrapper(x) would check if x OutOfRange or not. (we dont need to test if the number get from the Array is in the range)

Aug 21 '23 16:08 ch-yx