zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[v3] `LocalStore.get` clarification

Open d-v-b opened this issue 10 months ago • 4 comments

in v3, the following function gets bytes for the LocalStore.get method:

def _get(path: Path, byte_range: Optional[Tuple[int, Optional[int]]] = None) -> bytes:
    if byte_range is not None:
        start = byte_range[0]
        end = (start + byte_range[1]) if byte_range[1] is not None else None
    else:
        return path.read_bytes()
    with path.open("rb") as f:
        size = f.seek(0, io.SEEK_END)
        if start is not None:
            if start >= 0:
                f.seek(start)
            else:
                f.seek(max(0, size + start))
        if end is not None:
            if end < 0:
                end = size + end
            return f.read(end - f.tell())
        return f.read()

I have a few questions about this function that would help me write tests for it. I think @jhamman @normanrz are the main people I need answers from, but I'd be curious to hear if anyone else has thoughts.

  • the annotation of the function signature is Optional[Tuple[int, Optional[int]]], but the body of the function can handle tuple[None, int]. Should we change annotation of the function signature to tuple[int | None, int | None] | None?
  • I was initially confused about the semantics of byte_range -- I thought it defined an interval as a start and a stop, but now I understand that byte_range defines a start and a step size. This would be more transparent if we used keyword arguments like start and step instead of a tuple. Would switching to keyword arguments be a problem for this API? I think the win in clarity might be worth it.

d-v-b avatar Apr 05 '24 10:04 d-v-b

actually, the second conditional block can handle byte_range[0] being None, but this line

 end = (start + byte_range[1]) if byte_range[1] is not None else None

from the first conditional block errors if byte_range[0] is None, due to adding an int to None.

d-v-b avatar Apr 05 '24 10:04 d-v-b

  • the annotation of the function signature is Optional[Tuple[int, Optional[int]]], but the body of the function can handle tuple[None, int]. Should we change annotation of the function signature to tuple[int | None, int | None] | None?

The current API is:

class ByteRange(NamedTuple):
      start: int  # Can be negative for indexing from the end 
      end: Optional[int] = None

def _get(path: Path, byte_range: Optional[ByteRange] = None) -> bytes:

Not sure why we would allow start == None. Would that be equivalent to start == 0?

  • I was initially confused about the semantics of byte_range -- I thought it defined an interval as a start and a stop, but now I understand that byte_range defines a start and a step size. This would be more transparent if we used keyword arguments like start and step instead of a tuple. Would switching to keyword arguments be a problem for this API? I think the win in clarity might be worth it.

I don't think the byte range defines a step size. Where are you getting that from?

normanrz avatar Apr 15 '24 08:04 normanrz

I don't think the byte range defines a step size. Where are you getting that from?

That was my impression from the implementation

end = (start + byte_range[1]) if byte_range[1] is not None else None

d-v-b avatar Apr 15 '24 08:04 d-v-b

Ah, I would call that length or count. Anyways, seems you're right. I don't think I ever supplied the second tuple component in zarrita 🤷

normanrz avatar Apr 15 '24 08:04 normanrz