awkward
                                
                                 awkward copied to clipboard
                                
                                    awkward copied to clipboard
                            
                            
                            
                        Registry of methods attached to an Array due to its behavior
Description of new feature
This came up while working on behavior compatibility with dask-awkward.
TLDR: It would be useful to have some registry of methods originating from behaviors that are available to an array collection (likely through its typetracer array).
More explanation:
I'll use the muons dataset as an example. This will be reproducible with the following (with the latest version of awkward installed:
$ pip install git+https://github.com/ContinuumIO/dask-awkward@main
$ pip install fsspec s3fs
>>> import numpy as np
>>> import awkward._v2 as ak
>>> import dask_awkward as dak
>>> ds = dak.from_json("s3://ddavistemp/compressed_json/higgs.00*", storage_options={"anon": True})
>>> ds
dask.awkward<from-json, npartitions=10>
We have a dataset (ds) with 10 partitions (from 10 files). Let's make a muons array which gives us pairs of muons, and then make some muon record arrays:
>>> muons = ds.muons[dak.num(ds.muons, axis=1) == 2]
>>> mu1 = muons[:, 0]
>>> mu2 = muons[:, 1]
>>> mu1._meta
<Array-typetracer type='?? * {pt: float64, eta: float64, phi: float64, mass...'>
let's define some behaviors:
>>> class Muon(ak.Record):
...     pass
... 
>>> class MuonArray(ak.Array):
...     def mass_with(self, mu2):
...         return np.sqrt(
...             self.pt
...             * mu2.pt
...             * 2
...             * (np.cosh(self.eta - mu2.eta) - np.cos(self.phi - mu2.phi))
...         )
...
>>> ak.behavior["muon"] = Muon
>>> ak.behavior["*", "muon"] = MuonArray
First, the good stuff. As expected, everything works with the typetracer:
>>> ak.Array(mu1._meta, with_name="muon")
<MuonArray-typetracer type='?? * muon[pt: float64, eta: float64, phi: float...'>
we can use our mass_with method:
>>> ak.Array(mu1._meta, with_name="muon").mass_with(mu2._meta)
<Array-typetracer type='?? * float64'>
Let's try to give the behavior to the actual collection, we can use map partitions with a lambda:
>>> muons_withbehavior = dak.map_partitions(
...    lambda x, name: ak.Array(x, with_name=name),
...    muons,
...    "muon",
.... )
If we check the typetracer of this new collection we'll get (as expected) a MuonArray typetracer:
>>> muons_withbehavior._meta
<MuonArray-typetracer type='?? * var * muon[pt: float64, eta: float64, phi:...'>
Same thing if we grab the individual muons again:
>>> mu1_wb = muons_withbehavior[:, 0]
>>> mu2_wb = muons_withbehavior[:, 1]
>>> mu1_wb._meta
<MuonArray-typetracer type='?? * muon[pt: float64, eta: float64, phi: float...'>
The method can be used with the typetracer:
>>> mu1_wb._meta.mass_with(mu2._meta)
<Array-typetracer type='?? * float64'>
And finally, if we compute this new collection we'll get (as expected) a MuonArray:
>>> muons_withbehavior.compute()
<MuonArray [...] type='18333 * var * muon[pt: float64, eta: float64, phi: f...'>
The (expected) problem surfaces if we try to use the method on the collection. The collection doesn't know about mass_with():
>>> mu1_wb.mass_with(mu2_wb)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ddavis/software/repos/dask-awkward/src/dask_awkward/core.py", line 744, in __getattr__
    raise AttributeError(f"{attr} not in fields.")
AttributeError: mass_with not in fields.
A kind of brute force solution (we'll use the shorthand tt as an array's typetracer):
>>> tt = mu1_wb._meta
>>> possible_methods = set(dir(tt)) - set(tt.fields) - set(dir(ak.Array))
>>> possible_methods
{'__self_class__', '_layout', 'mass_with', '__self__', '_behavior', '__thisclass__', '_numbaview', '__get__'}
Upon calling dak.Array.__getattr__ we can do the above check and then filter this possible_methods set logic such that we move anything with a leading underscore, then we can do a map_partitions which calls the known "good" method on each node. example:
>>> getattrarg = "mass_with"
>>> possible_methods = list(filter(lambda x: not x.startswith("_"), possible_methods))
>>> if getattrarg in possible_methods:
...     f = lambda mu1, mu2, a: getattr(mu1, a)(mu2)
...     m = dak.map_partitions(f, mu1_wb, mu2_wb, getattrarg)
... 
>>> m
dask.awkward<lambda, npartitions=10>
>>> m.compute()
<Array [36.9, 22.9, 70.1, 59.8, ..., 83.3, 62.3, 90.9] type='18333 * float64'>
And it works (but only because in the collection we've added a step to the task graph which "behaviorized" on a node before this last map partitions call, we'll likely want a way to automate this step).
Anyway, in conclusion this whole story led @martindurant and I to think it would nice to have direct access to known methods provided by a behavior class, and go straight to the map partitions without the logic to figure out if the method may exist.
Another thing we thought a little bit about was syncing behaviors (or passing) to workers in a distributed cluster, but I'll leave that for another issue/discussion 😄
We don't want to put more burden on authors of behaviors to maintain a list of methods independently of the methods defined on overriding classes, so this list of methods would have to be derived from the ak._v2.behavior dict and the __dict__ of each overriding class in the __mro__. So it should be a calculation, similar to the possible_methods calculation in your example, but ensuring that nothing has been missed.
I had been thinking that the Awkward Dask collection would get these methods automatically because it would inherit from the same classes, but I was wrong: the classes that override behaviors are strict subclasses of ak._v2.Array and ak._v2.Record, but a Dask collection is not a subclass of ak._v2.Array. So that doesn't work automatically and you need to explicitly check to see if a potential method really is a method, by looking at the type tracer.
The __dir__ method is already a calculation, to add in the field names that you want to take out.
https://github.com/scikit-hep/awkward-1.0/blob/73ea78257d7a42008fd723f7f2632b53acbad0c3/src/awkward/_v2/highlevel.py#L1122-L1137
For that reason, maybe it's best to check the type(tt).__dict__ keys all the way up the tt mro until you get to ak._v2.Array? I don't think any intermediate classes in a hierarchy can have __slots__, but if so, then __dict__ and __slots__ may be the only things. The function that computes a list of method names can live in the Awkward codebase, though it would not be a very high-level function.
Vector is a still good test-case for this because its inheritance is really complicated. Getting that right means it's right!
thanks @jpivarski! here's what I've come up with based on your suggestion:
import awkward._v2 as ak
def possible_methods(tt: ak.Array) -> set[str]:
    mros = type(tt).mro()
    methods = set()
    for entry in mros[: mros.index(ak.Array)]:
        methods |= set([v.__name__ for k, v in vars(entry).items() if callable(v)])
    return methods
With my muons example I get {'mass_with'}, if I call possible_methods(mu1_wb._meta); all good.
For testing with vector:
import awkward as ak1
import vector
vector.register_awkward()
a = vector.awk(
    [
        [{"x": 1, "y": 1.1, "z": 0.1}, {"x": 2, "y": 2.2, "z": 0.2}],
        [],
        [{"x": 3, "y": 3.3, "z": 0.3}],
        [
            {"x": 4, "y": 4.4, "z": 0.4},
            {"x": 5, "y": 5.5, "z": 0.5},
            {"x": 6, "y": 6.6, "z": 0.6},
        ],
    ]
)
def possible_methods_ak1(tt: ak1.Array) -> set[str]:
    mros = type(tt).mro()
    methods = set()
    for entry in mros[: mros.index(ak1.Array)]:
        methods |= set([v.__name__ for k, v in vars(entry).items() if callable(v)])
    return methods
possible_methods_ak1(a)
gives:
Long output
{'VectorArray2D',
 'VectorArray3D',
 'VectorArray4D',
 '__getitem__',
 '_wrap_result',
 'add',
 'allclose',
 'cross',
 'deltaR',
 'deltaR2',
 'deltaangle',
 'deltaeta',
 'deltaphi',
 'dot',
 'equal',
 'is_antiparallel',
 'is_parallel',
 'is_perpendicular',
 'isclose',
 'not_equal',
 'rotateX',
 'rotateY',
 'rotateZ',
 'rotate_axis',
 'rotate_euler',
 'rotate_nautical',
 'rotate_quaternion',
 'scale',
 'scale2D',
 'scale3D',
 'subtract',
 'to_Vector2D',
 'to_Vector3D',
 'to_Vector4D',
 'to_rhophi',
 'to_rhophieta',
 'to_rhophietat',
 'to_rhophietatau',
 'to_rhophitheta',
 'to_rhophithetat',
 'to_rhophithetatau',
 'to_rhophiz',
 'to_rhophizt',
 'to_rhophiztau',
 'to_xy',
 'to_xyeta',
 'to_xyetat',
 'to_xyetatau',
 'to_xytheta',
 'to_xythetat',
 'to_xythetatau',
 'to_xyz',
 'to_xyzt',
 'to_xyztau',
 'transform2D',
 'transform3D',
 'unit'}
I haven't done a detailed comparison but it looks promising.
Do you think this is going down the right track?
Instead of callable, I would use inspect.ismethod.
Looks like we lose the ability to use inspect.ismethod if we don't actually have an instance.
using the type:
In [131]: inspect.ismethod(type(muons._meta).mass_with)
Out[131]: False
using an instance:
In [129]: inspect.ismethod(MuonsArray(muons._meta).mass_with)
Out[129]: True
This is not surprising, it's at inding time that a method becomes a method (you can call MuonsArray.mass_with() if you like, but it will fail if you don't pass the right thing for the first argument).
A more convoluted form, relying on python conventions:
list(inspect.signature(type(muons._meta).mass_with).parameters)[0] == "self"
Ah I see yeah makes sense. Something this also made me think of: maybe we can try something like below and avoid the derivation of a list-of-all-possible-methods altogether:
# in class Array:
def __getattr__(self, attr):
    # in the case where attr is not a field let's see if it's a method (behavior path)
    if attr not in (self.fields or []):
        try:
            maybe_method = getattr(array._meta, method_name)
            if inspect.ismethod(maybe_method):
                def wrapper(*args, **kwargs):
                    return self._call_behavior(attr, *args, **kwargs)
                return wrapper
        except AttributeError:
            raise AttributeError(f"{attr} not in fields.")
    # if not a behavior try the field access path
    try:
        return self.__getitem__(attr)
    except (IndexError, KeyError):
        raise AttributeError(f"{attr} not in fields.")    
Relying on exceptions is another option, yes. It'd be nice to compute the list of methods, though, so we can include them in dir().
For callable vs ismethod: do you need the results to be callable at all? You're wanting dak.Array to have the same names in its namespace as ak.Array; is it important for that to only include things that can be called? (Because you'll be delaying their actions, and therefore only callables will do? I'm not asking this question because I disagree, but because I'm not sure of the right answer.)
Actually, it is missing some things because of that constraint: it's missing the properties. I know you'll want to include them.
>>> import awkward as ak1
>>> import vector
>>> vector.register_awkward()
>>> vector = ak1.Array([{"x": 1.1, "y": 2.2}], with_name="Vector2D")
>>> vector
<VectorArray2D [{x: 1.1, y: 2.2}] type='1 * Vector2D["x": float64, "y": float64]'>
>>> vector.rho
<Array [2.46] type='1 * float64'>
>>> callable(vector.rho)
False
>>> import inspect
>>> inspect.ismethod(vector.rho)
False
The list from Vector didn't include all of the properties that a 3D vector would have: x, y, z (even though these are fields, they're also properties, and the properties go through __getitem__ to avoid an infinite recursion), rho, phi, theta, eta for all of the coordinates. There might be some properties that are not coordinates.
To help you check your results, here's a full list of what 2D, 3D, 4D vectors (with and without momentum names) add to an array:
https://github.com/scikit-hep/vector/blob/b9c60a8320b39812354c48182281480087565f5e/src/vector/_methods.py#L130-L1415
These "protocol" classes are nothing but interface, which should make it easier to scan or convert into a set to compare to what your possible_methods_ak1 generates.
Another question: would you want to include or exclude staticmethod/classmethod?
I'll just say that attributes and class method don't feel like things a dask collection normally does.
It sounds like we provided a solution to this that you integrated into dask-awkward. Is that right?
If I'm wrong in closing this, just say so and I'll reopen it, and then we'll figure out what to do with it.