[DF] `AsNumpy` should automatically convert RVecs into numpy arrays
Explain what you would like to see improved
df.AsNumpy(["vector_branch"]) produces a numpy array of RVec objects. Given that converting RVecs into numpy arrays is a zero-copy operation via np.asarray, we could/should instead return a numpy array of numpy arrays, which Python users would most likely be happier with.
To Reproduce
>>> import ROOT
>>> ROOT.RDataFrame(10).Define("v", "std::vector<int>{1,2,3}").Snapshot("t", "f.root")
>>> print(ROOT.RDataFrame("t", "f.root").AsNumpy("v")["v"])
Additional context
Triggered by the discussion at https://root-forum.cern.ch/t/reading-vector-branch-from-root-file-and-converting-it-to-numpy-array-on-pyroot/44152
Is this a simple pythonisation or do we need something more?
What should happen if the array is not regular? It should be possible to convert regular arrays (tree of vectors all with same number of elements) into a numpy array, but it would break whenever those vectors have a different number of elements. We could do this conditionally (first check if all elements have the same number of entries) but then in one case a 2D numpy array would be returned, in the other case a 1D array of objects. Is this what's intended? Thanks!
As discussed, we can opt for a new implementation of the pythonised method that:
- Return a numpy array of numpy arrays instead of a np array of RVec<T>s
- In case the length of all the RVec<T>s is the same, e.g. when reading from the same readout electronics, the output is a 2D np array