uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

feat: add tree to virtual array conversion

Open pfackeldey opened this issue 10 months ago • 8 comments

This waits for https://github.com/scikit-hep/awkward/pull/3364 (and a corresponding awkward release).

I may likely not have attempted the most optimal solution here. Happy for feedback & input.

pfackeldey avatar Feb 26 '25 20:02 pfackeldey

@pfackeldey - what is the plan for this PR? thanks!

ianna avatar Apr 11 '25 16:04 ianna

@pfackeldey - what is the plan for this PR? thanks!

I think we were discussing with Peter to add some caching support. Uproot will deserialize each electron branch for example separately with its own offsets. All those will have the same count_branch however so it's probably best to not deserialize the same offsets dozens of times. It's probably best to cache count_branch deserialization result (length) and use it for the other branches that have the same count_branch.

ikrommyd avatar Apr 11 '25 16:04 ikrommyd

I think we were discussing with Peter to add some caching support.

Isn't there already some caching being done in Uproot? When trying to read the count_branch multiple times it should already be hitting the cache

ariostas avatar Apr 11 '25 17:04 ariostas

I think we were discussing with Peter to add some caching support.

Isn't there already some caching being done in Uproot? When trying to read the count_branch multiple times it should already be hitting the cache

Yeah I need to try if it's hitting it, haven't done that yet. Will do today. Do you the best way to log that (the number of deserializations per branch)?

ikrommyd avatar Apr 11 '25 17:04 ikrommyd

Do you the best way to log that (the number of deserializations per branch)?

I'm not sure. I've just skimmed the code since at some point I'll have to do that for RNTuples

ariostas avatar Apr 11 '25 17:04 ariostas

@pfackeldey - what is the plan for this PR? thanks!

I'm not sure. I'm not a big fan of this implementation, but I also don't know how it can be done in a better way. I was hoping for some input here.

pfackeldey avatar Apr 11 '25 17:04 pfackeldey

Hi @ianna, Finally, I found a good implementation! This is now handling every awkward Content case in a programmatic way. I could reuse a similar logic as for uproot.dask. Could you have a look?

pfackeldey avatar Apr 23 '25 18:04 pfackeldey

Needs: https://github.com/scikit-hep/awkward/pull/3482/ and thus a new awkward release

pfackeldey avatar Apr 25 '25 17:04 pfackeldey

Hi, I just found this while wondering how to use / demonstrate virtual arrays without coffea and was surprised to not see it in a high-level API in uproot yet. There have been a few awkward releases in the meantime, I am curious if there is something left blocking this?

alexander-held avatar Aug 04 '25 18:08 alexander-held

Hi, I just found this while wondering how to use / demonstrate virtual arrays without coffea and was surprised to not see it in a high-level API in uproot yet. There have been a few awkward releases in the meantime, I am curious if there is something left blocking this?

No, nothing is blocking it. In fact we have talked about it briefly at a meeting. Now that virtual arrays are in the release, it should go ahead. Thanks for bringing it up!

ianna avatar Aug 04 '25 19:08 ianna

I think the only blocking thing was Pyodide, since it required an older version of Awkward. There was a Pyodide release yesterday, so I'll wrap up #1464 so that that this can proceed.

ariostas avatar Aug 05 '25 13:08 ariostas

If I may add a comment here. Merging this would mean the uproot has to require a relatively new version of awkward (which doesn't have to without this PR). I think there are two ways to go with this: A) Uproot indeed pins awkward to 2.8.2 (or whatever works around that) B) Uproot allows earlier versions of awkward but this particular function .virtual_arrays() errors of awkward is not new enough and tells the user to install awkward >= something.

Everything is up to you of course, I would just like to mention this.

ikrommyd avatar Aug 05 '25 14:08 ikrommyd

If I may add a comment here. Merging this would mean the uproot has to require a relatively new version of awkward (which doesn't have to without this PR). I think there are two ways to go with this: A) Uproot indeed pins awkward to 2.8.2 (or whatever works around that) B) Uproot allows earlier versions of awkward but this particular function .virtual_arrays() errors of awkward is not new enough and tells the user to install awkward >= something.

Everything is up to you of course, I would just like to mention this.

I'd prefer doing this via requirements and not parsing versions in the code. This is much more robust and we don't have to maintain/think of these if conditions in the future.

pfackeldey avatar Aug 05 '25 14:08 pfackeldey

I'd prefer doing this via requirements and not parsing versions in the code.

Also, I think it's fine since Awkward and Uproot are co-maintained, so if one can be upgraded, so can the other one. It's not like Awkward dropped support for Python 3.9, while Uproot still supported it or something like that.

ariostas avatar Aug 05 '25 14:08 ariostas

@pfackeldey - thanks! Looks good to me. Please merge it if you’re done with it. Thanks

Oh, it’s still in a draft mode 😅

ianna avatar Sep 05 '25 20:09 ianna

Hi @ariostas, I've updated the PR as we've discussed. The usage is now:

import uproot

tree = uproot.open("nano.root")["Events"]

array = tree.arrays(virtual=True, access_log=(log := []))
# >> <Array [{run: ??, ...}, ..., {run: ??, ...}] type='40 * {run: uint32, lumin...'>

ak.materialize(array.run)
# >> <Array [1, 1, 1, 1, 1, 1, 1, 1, ..., 1, 1, 1, 1, 1, 1, 1] type='40 * float64'>

print(log)
# >> [Accessed(branch='run', buffer_key="('<root>', 'run')-data")]

and I also made sure that proper errors are raised for non-sense kwarg combinations depending on the new virtual= kwarg. Let me know what you think :)

@ianna this PR is in principle ready now. I would only add some tests in addition.

pfackeldey avatar Oct 22 '25 11:10 pfackeldey