hist icon indicating copy to clipboard operation
hist copied to clipboard

[FEATURE] Fill from awkward arrays

Open andrzejnovak opened this issue 4 years ago • 4 comments

.fill() should intelligently take awkward arrays. It would be very convenient to do things like

blist = ['Jet_pt', 'Jet_eta']
h = Hist(...)

for chunk in f['Events'].iterate(blist, step_size=1000):
    h.fill(chunk)

Ideally, calling this would even work if chunk had more fields than the hist.

andrzejnovak avatar Nov 17 '20 10:11 andrzejnovak

Hi! Any updates on this and a new release?

DraTeots avatar Jan 19 '21 00:01 DraTeots

Sorry, nope yet. I'll discuss this with @henryiii for more details later. Thanks.

LovelyBuggies avatar Jan 19 '21 05:01 LovelyBuggies

I think a structure like this:

[
  [{'Jet_pt': 1, 'Jet_eta': 2}],
  [{'Jet_pt': 1, 'Jet_eta': 2}, {'Jet_pt': 1, 'Jet_eta': 2}],
]

Converts to this hist fill: Jet_pt=ak.flatten(awkarr['Jet_pt']), Jet_eta=ak.flatten(awkarr['Jet_eta'])

henryiii avatar Jun 25 '21 14:06 henryiii

ak.flatten with axis=None will merge all of the numbers in an Awkward Array into one big, 1-dimensional array for plotting, though that's usually undesirable, as described here:

https://awkward-array.org/how-to-restructure-flatten.html

Who would want a plot with pt and eta mixed together into the same histogram?

Oh—I see now—you want to turn each field of an array of records into a different histogram dimension, coupling knowledge of the array structure with the histogram-builder. That makes a lot of sense. You can use ak.fields and ak.unzip to generically get field names (for labeling axes) and field arrays in the same order. Then, in case they contain multiple levels of lists or multiple levels of missing values, you can pass axis=None to ak.flatten to squash everything on every level, within a field.

You might also want to recursively call the ak.fields/ak.unzip pair, since records can contain records, and these will only unzip the first level of record depth. If you do that, you'll never mix different types of data (e.g. pt and eta) at the same histogram axis depth when using axis=None, and axis=None will eliminate any number of list boundaries and missing values.

jpivarski avatar Jun 25 '21 15:06 jpivarski