sisl
sisl copied to clipboard
Classification of atoms using categories
Following discussion in #688
I think it would be nice that categories facilitated classification. Imagine you have three categories cat1
, cat2
and cat3
. There could be a nice way of grouping atoms based on whether they belong to those categories or not. I.e. if I have a geometry, I could do something like:
geometry.classify((cat1, cat2, cat3))
And I would get an array saying to which group each atom belongs:
[0, 0, 0, 2, 1, 2, ..., -1, 0]
This I think is basically what the "Or" category does, but returning indices instead of the categories themselves.
Another thing that would be useful for classification is to provide some property and ask for the creation of groups based on that property:
geometry.classify(AtomNneighbors)
and you get maybe a tuple with the group indices and the meaning of each group:
[0, 0, 0, 1, 0, 2, 0, 1, 2], (AtomNneighbors(0), AtomNneighbors(1), AtomNneighbors(2) )
the question, I guess, is more whether we want a new classify vs categorize
.
Also, what should happen when classify
can inadvertently categorize as multiple versions, same problem as Category
.
I.e.
catA = ...
catB = ...
geometry.classify((catA, catB, catA | catB))
Consider more the case when users creates the last catC
by accident?
It seems to me the functionality could be simple enough:
result = geometry.categorize(catA | catB | catC)
catA in result[0]
wouldn't this work equivalently?
Or perhaps catA.indices(result)
would give back the indices that are the same?
From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?
From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?
Yes, and I think that can be surprising to end users when that isn't the case. Then I guess it would be better to return the category that matches any of them.
From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?
If the goal is to classify, yes :)
Maybe it would be best to have a separate thing for classifying. So Classifier
would be separate from Category
. I just saw some sinergies between the two things. But I guess what both things share is to compute some atom-wise property, and then they perform different operations with it.
So perhaps it could be worth it to implement AtomProperty
and then make categories and classifiers potentially use them.
From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?
If the goal is to classify, yes :)
Maybe it would be best to have a separate thing for classifying. So
Classifier
would be separate fromCategory
. I just saw some sinergies between the two things. But I guess what both things share is to compute some atom-wise property, and then they perform different operations with it.So perhaps it could be worth it to implement
AtomProperty
and then make categories and classifiers potentially use them.
But can we do without? The problem is that distinguishing between Classifier
and Category
will not be obvious, and if 90% cases can be handled via one, and the rest 10% requires some tweaking, then I think readability has higher priority.
Why isn't it obvious? A category just determines whether an atom fulfills or not a certain criteria, and a classifier groups atoms based on properties. The grouping might even have thresholds to tweak how many groups you want to create, for example, which are not so easy to determine by hand.
An example that is impossible (or I wouldn't know how to do it) to implement with categories: I once had a box of water molecules and I wanted to determine which atoms belonged to the same molecule, so that I could plot the PDOS per water molecule. I used some sklearn
clustering algorithm (I don't remember which) and then I got an array containing to which water molecule (cluster) each atom belonged. This is a classification based on some property (distance between atoms) and some grouping algorithm. I don't know how to acheive this with categories, and I believe this kind of things will be useful to many people that don't even know how to start implementing them themselves.
This I would argue aims at us changing what the categories can do. Meaning, that possibly a category should be able to contain more content, and not just index like.
I am primarily worried about the implications on documenting Category
vs Classifier
for end-users. From a first glance they look very similar, and I bet Category
would just be a subset of Classifier
, so hence we might as well change everything to do what you want the Classifier
to do, no?
Hmm yeah that's true, a category is kind of a classifier with just two groups.
As I said in the other issue, I'm too busy to think about anything useful now, so I'll continue this discussion next week :sweat_smile:
Just one question for me to follow... What is the difference between classifier and categorize?
Just one question for me to follow... What is the difference between classifier and categorize?
currently we don't have a Classifier
@pfebrer is advocating on adding this. I am trying to figure out if we can have a single way to do both (i.e. only have a Category
, or name it Classifier
).
So an open discussion on that for now. :)
Ok, thanks Nick. I will try to read your various issues commenting on this topic early next week - so that I have a better overall picture. For me it is hard to see how classifier/category could be used to classify atoms in different hexagons within, say, a GNR - specially because such hexagons would have common (sharing) atoms.. But there might be the way to do this.
Ok, thanks Nick. I will try to read your various issues commenting on this topic early next week - so that I have a better overall picture. For me it is hard to see how classifier/category could be used to classify atoms in different hexagons within, say, a GNR - specially because such hexagons would have common (sharing) atoms.. But there might be the way to do this.
The idea here, would be to fundamentally restructure the Category
class. So that it can re-act to the surroundings (other atoms than it-self).
We have too many categories
issues, see #687 which will be the main topic now.