hist
hist copied to clipboard
[FEATURE] Fill from awkward arrays
.fill()
should intelligently take awkward arrays. It would be very convenient to do things like
blist = ['Jet_pt', 'Jet_eta']
h = Hist(...)
for chunk in f['Events'].iterate(blist, step_size=1000):
h.fill(chunk)
Ideally, calling this would even work if chunk
had more fields than the hist.
Hi! Any updates on this and a new release?
Sorry, nope yet. I'll discuss this with @henryiii for more details later. Thanks.
I think a structure like this:
[
[{'Jet_pt': 1, 'Jet_eta': 2}],
[{'Jet_pt': 1, 'Jet_eta': 2}, {'Jet_pt': 1, 'Jet_eta': 2}],
]
Converts to this hist fill: Jet_pt=ak.flatten(awkarr['Jet_pt']), Jet_eta=ak.flatten(awkarr['Jet_eta'])
ak.flatten with axis=None
will merge all of the numbers in an Awkward Array into one big, 1-dimensional array for plotting, though that's usually undesirable, as described here:
https://awkward-array.org/how-to-restructure-flatten.html
Who would want a plot with pt and eta mixed together into the same histogram?
Oh—I see now—you want to turn each field of an array of records into a different histogram dimension, coupling knowledge of the array structure with the histogram-builder. That makes a lot of sense. You can use ak.fields and ak.unzip to generically get field names (for labeling axes) and field arrays in the same order. Then, in case they contain multiple levels of lists or multiple levels of missing values, you can pass axis=None
to ak.flatten to squash everything on every level, within a field.
You might also want to recursively call the ak.fields/ak.unzip pair, since records can contain records, and these will only unzip the first level of record depth. If you do that, you'll never mix different types of data (e.g. pt and eta) at the same histogram axis depth when using axis=None
, and axis=None
will eliminate any number of list boundaries and missing values.