sisl Change categories for general purpose

Describe the feature

Currently the Categories used for geometries are limited in use. @tfrederiksen initial idea in #202 spawned this whole category stuff. Now the problem with the categories is that they are only based on 1 atoms at a time. Which doesn't really make it useful for other categories.

Currently a category dose more or less these things:

Create the categories via boolean operators making composite categories.
When trying to categorise a geometry every atom is looped individually meaning that every atom gets assigned a single category (possibly a NullCategory).
```
import sisl as si
gr = si.geom.graphene()
print(si.geom.AtomOdd().categorize(gr))
```
returns [ø, odd].
The categorize function will only ever return a single category for each atom. This means that one cannot create categories that encompass multiple atoms, say dihedral angle requirements etc.

What I suggest is that we change the categories to return a CategoryResult which contains 1) the category it resulted in and 2) any arguments that fits the category. I.e. a CategoryNeighbour may contain the atom index, and that atoms neighbours. This means that it has some state knowledge which I kind of objected against previously. However, since now the results are deferred to another object I think it becomes more stable. This however makes it a bit more problematic to use since one needs to easily decide which atoms belongs to which categories, and vice versa.

I am thinking about an interface that looks something like this:

# using addition we request that both be evaluated
odd = AtomOdd()
even = AtomEven()
cat = odd + even
# using boolean operators silently disregards those that are not part of them
# in this case they will return the same as above since they are from two distinct sets
cat = odd | even
results = cat.categorize(geometry)
# get the list of atoms that matches the category odd
# I am thinking that result(odd) should do the same?
# Is there anything a result should be doing
atoms = results.get(odd)
# get the categories that atom 1 matches (there may be more)
cats = results.get(1)
# one should be able to loop the different results over the different categories
for result, atoms in results:
    # the result is a class of CategoryResult as defined above
    result.category
    # each result may have specific attributes associated such that one can see
    # specific details for the category, say the neighbour indices etc.

Note that since a category result may have additional data associated we could expect it to retain information related to the category it self.

neighbours = AtomNeighbours(n=3) # or something
results = neighbours.categorize(geometry)
for result, atoms in results:
    cat = result.category

It isn't totally clear to me exactly what the interface should look like here, perhaps the easiest is if the result category for neighbours only has a single atom associated, and the result.neighbours is the list of neighbour indices?

ideas; suggestions etc. are most welcome @pfebrer and @tfrederiksen.

Aug 27 '21 08:08 zerothi

For now, I have just used categories to pass them on the atoms argument. So from that point of view all that concerns me is that atoms are correctly selected when I pass a category.

I think up until now using categorize doesn't add any value to the end user because you don't get additional information (vs just getting the indices that satisfy the category). So anything that gives more information of how atoms were matched is probably great, mostly to build things on top. I'm just worried that this might slow down the selection of atoms.

So maybe we could keep both? The current one would be used by _sanitize_atoms and the one that you propose now could be used by users or internally when needed. In that regard, could you share a quick sketch of how would a category that profits from this work (e.g. the one that selects by dihedral angle)?

Aug 27 '21 16:08 pfebrer

For now, I have just used categories to pass them on the atoms argument. So from that point of view all that concerns me is that atoms are correctly selected when I pass a category.

Yes, this needs some careful thoughts here.

I think up until now using categorize doesn't add any value to the end user because you don't get additional information (vs just getting the indices that satisfy the category). So anything that gives more information of how atoms were matched is probably great, mostly to build things on top. I'm just worried that this might slow down the selection of atoms.

Exactly, the plan would be to make categories more useful, i.e. find specific atoms meeting certain criteria. So having the result contain this auxiliary information might be what pushes it.
Regarding performance, actually the current categorize is horrendously slow since it can never parse multiple atoms at the same time. Everything is done on a single loop. My suggestion would allow any way the categorization sees fit to handle the details. So I think it would only be much faster! :)

So maybe we could keep both? The current one would be used by _sanitize_atoms and the one that you propose now could be used by users or internally when needed. In that regard, could you share a quick sketch of how would a category that profits from this work (e.g. the one that selects by dihedral angle)?

I think they could easily co-exist with some minor changes to _sanitize_atoms to allow the details outlined here. I.e. all atoms belonging to one of the categories would be selected, in effect the same way it is done now.

Aug 27 '21 19:08 zerothi

This is superseeded in #687

Apr 15 '24 17:04 zerothi

sisl sisl copied to clipboard

Change categories for general purpose

sisl
sisl copied to clipboard