feat: add tree to virtual array conversion
This waits for https://github.com/scikit-hep/awkward/pull/3364 (and a corresponding awkward release).
I may likely not have attempted the most optimal solution here. Happy for feedback & input.
@pfackeldey - what is the plan for this PR? thanks!
@pfackeldey - what is the plan for this PR? thanks!
I think we were discussing with Peter to add some caching support. Uproot will deserialize each electron branch for example separately with its own offsets. All those will have the same count_branch however so it's probably best to not deserialize the same offsets dozens of times. It's probably best to cache count_branch deserialization result (length) and use it for the other branches that have the same count_branch.
I think we were discussing with Peter to add some caching support.
Isn't there already some caching being done in Uproot? When trying to read the count_branch multiple times it should already be hitting the cache
I think we were discussing with Peter to add some caching support.
Isn't there already some caching being done in Uproot? When trying to read the count_branch multiple times it should already be hitting the cache
Yeah I need to try if it's hitting it, haven't done that yet. Will do today. Do you the best way to log that (the number of deserializations per branch)?
Do you the best way to log that (the number of deserializations per branch)?
I'm not sure. I've just skimmed the code since at some point I'll have to do that for RNTuples
@pfackeldey - what is the plan for this PR? thanks!
I'm not sure. I'm not a big fan of this implementation, but I also don't know how it can be done in a better way. I was hoping for some input here.
Hi @ianna,
Finally, I found a good implementation!
This is now handling every awkward Content case in a programmatic way. I could reuse a similar logic as for uproot.dask.
Could you have a look?
Needs: https://github.com/scikit-hep/awkward/pull/3482/ and thus a new awkward release
Hi, I just found this while wondering how to use / demonstrate virtual arrays without coffea and was surprised to not see it in a high-level API in uproot yet. There have been a few awkward releases in the meantime, I am curious if there is something left blocking this?
Hi, I just found this while wondering how to use / demonstrate virtual arrays without
coffeaand was surprised to not see it in a high-level API inuprootyet. There have been a fewawkwardreleases in the meantime, I am curious if there is something left blocking this?
No, nothing is blocking it. In fact we have talked about it briefly at a meeting. Now that virtual arrays are in the release, it should go ahead. Thanks for bringing it up!
I think the only blocking thing was Pyodide, since it required an older version of Awkward. There was a Pyodide release yesterday, so I'll wrap up #1464 so that that this can proceed.
If I may add a comment here. Merging this would mean the uproot has to require a relatively new version of awkward (which doesn't have to without this PR). I think there are two ways to go with this:
A) Uproot indeed pins awkward to 2.8.2 (or whatever works around that)
B) Uproot allows earlier versions of awkward but this particular function .virtual_arrays() errors of awkward is not new enough and tells the user to install awkward >= something.
Everything is up to you of course, I would just like to mention this.
If I may add a comment here. Merging this would mean the uproot has to require a relatively new version of awkward (which doesn't have to without this PR). I think there are two ways to go with this: A) Uproot indeed pins awkward to 2.8.2 (or whatever works around that) B) Uproot allows earlier versions of awkward but this particular function
.virtual_arrays()errors of awkward is not new enough and tells the user to install awkward >= something.Everything is up to you of course, I would just like to mention this.
I'd prefer doing this via requirements and not parsing versions in the code. This is much more robust and we don't have to maintain/think of these if conditions in the future.
I'd prefer doing this via requirements and not parsing versions in the code.
Also, I think it's fine since Awkward and Uproot are co-maintained, so if one can be upgraded, so can the other one. It's not like Awkward dropped support for Python 3.9, while Uproot still supported it or something like that.
@pfackeldey - thanks! Looks good to me. Please merge it if you’re done with it. Thanks
Oh, it’s still in a draft mode 😅
Hi @ariostas, I've updated the PR as we've discussed. The usage is now:
import uproot
tree = uproot.open("nano.root")["Events"]
array = tree.arrays(virtual=True, access_log=(log := []))
# >> <Array [{run: ??, ...}, ..., {run: ??, ...}] type='40 * {run: uint32, lumin...'>
ak.materialize(array.run)
# >> <Array [1, 1, 1, 1, 1, 1, 1, 1, ..., 1, 1, 1, 1, 1, 1, 1] type='40 * float64'>
print(log)
# >> [Accessed(branch='run', buffer_key="('<root>', 'run')-data")]
and I also made sure that proper errors are raised for non-sense kwarg combinations depending on the new virtual= kwarg.
Let me know what you think :)
@ianna this PR is in principle ready now. I would only add some tests in addition.