root icon indicating copy to clipboard operation
root copied to clipboard

[DF] `AsNumpy` should automatically convert RVecs into numpy arrays

Open eguiraud opened this issue 4 years ago • 3 comments

Explain what you would like to see improved

df.AsNumpy(["vector_branch"]) produces a numpy array of RVec objects. Given that converting RVecs into numpy arrays is a zero-copy operation via np.asarray, we could/should instead return a numpy array of numpy arrays, which Python users would most likely be happier with.

To Reproduce

>>> import ROOT
>>> ROOT.RDataFrame(10).Define("v", "std::vector<int>{1,2,3}").Snapshot("t", "f.root")
>>> print(ROOT.RDataFrame("t", "f.root").AsNumpy("v")["v"])

Additional context

Triggered by the discussion at https://root-forum.cern.ch/t/reading-vector-branch-from-root-file-and-converting-it-to-numpy-array-on-pyroot/44152

eguiraud avatar Mar 25 '21 11:03 eguiraud

Is this a simple pythonisation or do we need something more?

dpiparo avatar Feb 03 '24 07:02 dpiparo

What should happen if the array is not regular? It should be possible to convert regular arrays (tree of vectors all with same number of elements) into a numpy array, but it would break whenever those vectors have a different number of elements. We could do this conditionally (first check if all elements have the same number of entries) but then in one case a 2D numpy array would be returned, in the other case a 1D array of objects. Is this what's intended? Thanks!

lobis avatar Feb 13 '24 16:02 lobis

As discussed, we can opt for a new implementation of the pythonised method that:

  1. Return a numpy array of numpy arrays instead of a np array of RVec<T>s
  2. In case the length of all the RVec<T>s is the same, e.g. when reading from the same readout electronics, the output is a 2D np array

dpiparo avatar Feb 14 '24 10:02 dpiparo