awkward
awkward copied to clipboard
ak.from_json should take a `limit` parameter to only read what is necessary
Description of new feature
-
limit=NONNEGATIVE-INT
should be passed to C++ to stop the JSON-reading as soon as the number of entries in theArrayBuilder
reacheslimit
. -
limit=(NONNEGATIVE-INT, NONNEGATIVE-INT)
should pass both values; the first is a lower limit. They're both non-negative: they do not/cannot count from the end of a JSON document or stream. The lower limit doesn't prevent any reading or parsing, but it prevents data from being passed into theArrayBuilder
, which can save a lot of memory.
These should apply equally well to single-document and line-delimited mode.
A similar feature for ak.from_iter
is not needed because Python already has an itertools.islice
that users can use. (If we were to implement limit
on Python iterators for symmetry, we'd just use itertools.islice
internally.)
ak.from_parquet
has a way to select row groups, but it would be more intuitive to be able to work with the same sort of limit
argument; we'd just need to look in the Parquet metadata to translate entry limit
into row group numbers (and then slice the unwanted parts of the first and last row groups... like Uproot already does with entry_start
and entry_stop
). For a format like Parquet, negative limits, counting from the end of the file, would be doable.