sisl Classification of atoms using categories

Following discussion in #688

I think it would be nice that categories facilitated classification. Imagine you have three categories cat1, cat2 and cat3. There could be a nice way of grouping atoms based on whether they belong to those categories or not. I.e. if I have a geometry, I could do something like:

geometry.classify((cat1, cat2, cat3))

And I would get an array saying to which group each atom belongs:

[0, 0, 0, 2, 1, 2, ..., -1, 0]

This I think is basically what the "Or" category does, but returning indices instead of the categories themselves.

Another thing that would be useful for classification is to provide some property and ask for the creation of groups based on that property:

geometry.classify(AtomNneighbors)

and you get maybe a tuple with the group indices and the meaning of each group:

[0, 0, 0, 1, 0, 2, 0, 1, 2], (AtomNneighbors(0), AtomNneighbors(1), AtomNneighbors(2) )

Feb 23 '24 10:02 pfebrer

the question, I guess, is more whether we want a new classify vs categorize.

Also, what should happen when classify can inadvertently categorize as multiple versions, same problem as Category.

I.e.

catA = ...
catB = ...
geometry.classify((catA, catB, catA | catB))

Consider more the case when users creates the last catC by accident?

It seems to me the functionality could be simple enough:

result = geometry.categorize(catA | catB | catC)
catA in result[0]

wouldn't this work equivalently?

Feb 23 '24 11:02 zerothi

Or perhaps catA.indices(result) would give back the indices that are the same?

Feb 23 '24 11:02 zerothi

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

Feb 23 '24 12:02 ialcon

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

Yes, and I think that can be surprising to end users when that isn't the case. Then I guess it would be better to return the category that matches any of them.

Feb 23 '24 12:02 zerothi

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

If the goal is to classify, yes :)

Maybe it would be best to have a separate thing for classifying. So Classifier would be separate from Category. I just saw some sinergies between the two things. But I guess what both things share is to compute some atom-wise property, and then they perform different operations with it.

So perhaps it could be worth it to implement AtomProperty and then make categories and classifiers potentially use them.

Feb 23 '24 12:02 pfebrer

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

If the goal is to classify, yes :)

Maybe it would be best to have a separate thing for classifying. So Classifier would be separate from Category. I just saw some sinergies between the two things. But I guess what both things share is to compute some atom-wise property, and then they perform different operations with it.

So perhaps it could be worth it to implement AtomProperty and then make categories and classifiers potentially use them.

But can we do without? The problem is that distinguishing between Classifier and Category will not be obvious, and if 90% cases can be handled via one, and the rest 10% requires some tweaking, then I think readability has higher priority.

Feb 23 '24 12:02 zerothi

Why isn't it obvious? A category just determines whether an atom fulfills or not a certain criteria, and a classifier groups atoms based on properties. The grouping might even have thresholds to tweak how many groups you want to create, for example, which are not so easy to determine by hand.

An example that is impossible (or I wouldn't know how to do it) to implement with categories: I once had a box of water molecules and I wanted to determine which atoms belonged to the same molecule, so that I could plot the PDOS per water molecule. I used some sklearn clustering algorithm (I don't remember which) and then I got an array containing to which water molecule (cluster) each atom belonged. This is a classification based on some property (distance between atoms) and some grouping algorithm. I don't know how to acheive this with categories, and I believe this kind of things will be useful to many people that don't even know how to start implementing them themselves.

Feb 23 '24 12:02 pfebrer

This I would argue aims at us changing what the categories can do. Meaning, that possibly a category should be able to contain more content, and not just index like.

I am primarily worried about the implications on documenting Category vs Classifier for end-users. From a first glance they look very similar, and I bet Category would just be a subset of Classifier, so hence we might as well change everything to do what you want the Classifier to do, no?

Feb 23 '24 12:02 zerothi

Hmm yeah that's true, a category is kind of a classifier with just two groups.

As I said in the other issue, I'm too busy to think about anything useful now, so I'll continue this discussion next week :sweat_smile:

Feb 23 '24 13:02 pfebrer

Just one question for me to follow... What is the difference between classifier and categorize?

Feb 23 '24 13:02 ialcon

Just one question for me to follow... What is the difference between classifier and categorize?

currently we don't have a Classifier @pfebrer is advocating on adding this. I am trying to figure out if we can have a single way to do both (i.e. only have a Category, or name it Classifier).
So an open discussion on that for now. :)

Feb 23 '24 13:02 zerothi

Ok, thanks Nick. I will try to read your various issues commenting on this topic early next week - so that I have a better overall picture. For me it is hard to see how classifier/category could be used to classify atoms in different hexagons within, say, a GNR - specially because such hexagons would have common (sharing) atoms.. But there might be the way to do this.

Feb 23 '24 13:02 ialcon

Ok, thanks Nick. I will try to read your various issues commenting on this topic early next week - so that I have a better overall picture. For me it is hard to see how classifier/category could be used to classify atoms in different hexagons within, say, a GNR - specially because such hexagons would have common (sharing) atoms.. But there might be the way to do this.

The idea here, would be to fundamentally restructure the Category class. So that it can re-act to the surroundings (other atoms than it-self).

Feb 28 '24 11:02 zerothi

We have too many categories issues, see #687 which will be the main topic now.

Apr 15 '24 17:04 zerothi

sisl sisl copied to clipboard

Classification of atoms using categories

sisl
sisl copied to clipboard