mdanalysis Implement context aware guesser

trafficstars

Is your feature request related to a problem?

The current Guesser class used by the users or inside parsers hava a downside of being generic, which makes it doesn't fit all topologies and force fields . This result in various errors while guessing different atrributes (#2348 # 3218 #2331 #2265). That’s why we need a guesser class that is aware of the context of the universe of which it is a part.

Describe the solution you'd like

I got inspired by @lilyminium (#2630), @jbarnoud , and @mnmelo (#598) proposals, so I mixed them with my vision to get the following methodology: • Guessing is not an automatic process; by default, guessing is off unless the user chose to guess which property. • Guesser should raise warning/messages when succeeding or failing in guessing a property (one single message for the whole guessed attribute). • Guessed properties should be easily modifiable. • Passing the context to the universe should be available at and after the initiation level. • Modifications and maintenance of guessers should be convenient.

Implementation:

The BaseGuesser class will be the parent class for each new context-specific guesser, it will hold no guessing methods; to make space for implementing customized child guessers, yet it will define the behavior by which child guessers should behave for organizing the guesser structure.

 
class GuesserMeta(type):
    #guessers registeration
    def __init__(cls, name, bases, classdict):
        type.__init__(type, name, bases, classdict)
        try:
            context = classdict["context"]
        except KeyError:
            pass
        else:
            _GUESSERS[context.upper()] = cls
 
#basic guesser class
class BaseGuesser(GuesserMeta):
    context = "base"
    
# a dictionary to keep track of the properties currently guessed by the class
    __guess={}
    
#check if the class has the desired guessing method for the desired attribute
    def is_guessed(to_guess):
         for a in to_guess:
             if (a not in self.guess.key()):
                 raise ValueError ('Invalid property: ', a)
         return True
    
#Martini-speciifc class that inherit from the Guesser class    
class MartiniGuesser(BaseGuesser):
    
    context = 'Martini'
    __guess = {'mass': self.guess_mass, 
                 'charge' : self.guess_charge}
    
    def guess_mass(atoms):
        #TODO
               
    def guess_charge(atoms):
        #TODO

At the universe initiation level, the context will pass as an argument (either as a name or a guesser object), in addition, a to_guess list will pass the desired attributes to be guessed.


 
#pass context by name
u = mda.Universe("E://1rds.pdb", context="PDB", to_guess=["mass", "bonds"])
 
#pass context by object
pdb = md.guesser.PDBGuesser
u = mda.Universe("E://1rds.pdb", context=pdb, to_guess = ["mass", "bonds"])

Also, the guesser class could be called after the universe's creation with the guess_topology_attr() API with the same spirit as the above implementation.

#pass context by name
u.guess_topology_attr(context="pdb", to_guess=["mass"])
#pass context by object
pdb = md.guesser.PDBGuesser
u.guess_topology_attr(context=pdb, to_guess["mass"])

If the user didn’t specify a context to the universe, a DefaultGuesser will be called, this guesser class will have the current generic guess methods, so we don't not break any existing code.
The universe then will check the validity of the passed arguments and raise the appropriate errors accordingly.

 
#method used by the universe to check the validity of the passed arguments
    def get_guesser(guesser, to_guess):
        if isinstance(guesser, BaseGuesser):
#check if the guesser has guessing method(s) for the 'to_guess' list
            try:
                guesser.isguessed(to_guess)
            except ValueError as e:
                print('Value error')
        else:
            try:
                guesser = _GUESSER[guesser]
            except KeyError:
                print("invalid guesser")
        return guesser

Different guessing methods will through warnings/error messages depending on the results it got, the output messages should precisely describe the universe updates with a warning about failed processes.

u = mda.Universe("E://1rds.pdb", context="PDB", to_guess=["mass", "bonds"])
#output of succusseful mass guessing:
# sucessful guessing: guessed masses 90/90
#output when guessing fail for some atoms:
# guessed masses 87/90
# UserWarning: Failed to guess the mass for the following atom(s):
# id 3 name XX
# id 9 name foo
# id 34 name bla
#output for bonds guessing:
# guessed bonds 100
# new fragments 80

The mission of deciding how an attribute will be guessed will be carried out by the corresponding attribute guesser method related to each class. I think in this way the user doesn’t have to bother about how a guesser should work, and in the spirit of implementing a context-specific guesser, we have an abstraction power by having aware and smart guessers that know how exactly any attribute should be guessed for a specific environment (for example guessing mass for PDB is related to the element property, while for Martini is more related to bead type (atom type in MDAnalysis). the following diagram shows attributes dependencies for the current default behavior, prospective PDB and prospective Martini forcefield respectively.

(white is given, orange is guessed)

Current default behavior

default

PDB

pdb final

Martini

martini final

Jun 05 '22 00:06 aya9aladdin

Thank you for opening this issue. Could you please:

make sure the code samples are well formatted using the appropriate GitHub syntax. It will make them much easier to read and make it more likely to get feedback from the others.
have a look at where the guessing happens now and what are the attributes that are guessed throughout the code base.

Just to clarify. When you write:

Guessing is not an automatic process; by default, guessing is off unless the user chose to guess which property.

This is the ideal scenario but we cannot have that until version 3.0 as it is a breaking change. You do address this, but it is worth repeating.

Because no default guessing is a desirable, it would be good to have a way to request it. It would also be useful to prepare a transition with future versions of mdanalysis.

Jun 06 '22 09:06 jbarnoud

Different guessing methods will through warnings/error messages depending on the results it got, the output messages should precisely describe the universe updates with a warning about failed processes.

Where do you plan the output for successful guessing to be? Do we have anything comparable at the moment?

Jun 08 '22 08:06 jbarnoud

The mission of deciding how an attribute will be guessed will be carried out by the corresponding attribute guesser method related to each class. I think in this way the user doesn’t have to bother about how a guesser should work, and in the spirit of implementing a context-specific guesser, we have an abstraction power by having aware and smart guessers that know how exactly any attribute should be guessed for a specific environment (for example guessing mass for PDB is related to the element property, while for Martini is more related to bead type (atom type in MDAnalysis).

It might be really cool to have some dependency diagrams for this! Both for documentation, and for clarity in the project. For example, as you point out with PDB guessing, guessing the mass depends on knowing the element. In turn, guessing the element depends largely on knowing the atom name and residue name. Having this clarity gives us room for adding future guessers and features more easily. For example, with the PDB, it could be possible to guess the residue name from elements + bonds (which is likely out of scope for the current project). We could maybe look into decorators to make sure that the attribute dependencies are either already present or guessed first.

Edit: and the diagrams would look great in a blog post.

Jun 08 '22 19:06 lilyminium

The mission of deciding how an attribute will be guessed will be carried out by the corresponding attribute guesser method related to each class. I think in this way the user doesn’t have to bother about how a guesser should work, and in the spirit of implementing a context-specific guesser, we have an abstraction power by having aware and smart guessers that know how exactly any attribute should be guessed for a specific environment (for example guessing mass for PDB is related to the element property, while for Martini is more related to bead type (atom type in MDAnalysis).

It might be really cool to have some dependency diagrams for this! Both for documentation, and for clarity in the project. For example, as you point out with PDB guessing, guessing the mass depends on knowing the element. In turn, guessing the element depends largely on knowing the atom name and residue name. Having this clarity gives us room for adding future guessers and features more easily. For example, with the PDB, it could be possible to guess the residue name from elements + bonds (which is likely out of scope for the current project). We could maybe look into decorators to make sure that the attribute dependencies are either already present or guessed first.

Edit: and the diagrams would look great in a blog post. the diagrams idea is really great, I will work on finding how to deliver it in the most descriptive way then adding it to my blog

Jun 09 '22 17:06 aya9aladdin

Edit: and the diagrams would look great in a blog post. the diagrams idea is really great, I will work on finding how to deliver it in the most descriptive way then adding it to my blog

@aya9aladdin I use draw.io (now https://app.diagrams.net). It's a very nice tool with lots of cool options to design custom diagrams. I used it for general stuff and also used them for docs.

Jun 09 '22 19:06 ojeda-e

It might be really cool to have some dependency diagrams for this!

100%!

I always found the name "guesser" somewhat confusing, because some attributes are not technically guessed but they are read from a table once other (guessed) attributes are available. A diagram would clarify this confusion by showing what is "guessed" (i.e. inferred with some predefined rules, which might not be perfect) first, and what it is simply set based on the guessed attributes later on.

I think it would be great to color-code (or distinguish in other ways) which attributes are guessed and which are set on the diagram.

Jun 09 '22 20:06 RMeli

I have added a dependency diagrams to the issue for declaration, you can also see my project updates on my personal blog https://sites.google.com/pharma.asu.edu.eg/aya-gsoc/home

Jun 13 '22 15:06 aya9aladdin

Thank you @aya9aladdin. Is your diagram reflecting what you plan on going, or is it what we currently have in the code?

Jun 14 '22 06:06 jbarnoud

Thank you @aya9aladdin. Is your diagram reflecting what you plan on going, or is it what we currently have in the code?

it's more of a reflection of what I'm going to build in both PDB and Martini guessers

Jun 16 '22 21:06 aya9aladdin

Thank you @aya9aladdin. Is your diagram reflecting what you plan on going, or is it what we currently have in the code?

it's more of a reflection of what I'm going to build in both PDB and Martini guessers

In that case, could you do the same diagram but with the current state of the code. It is what we had in mind when asking for the diagram. Looking at this, will require you to find the various places where the guessing is done, and you may realise oddities. For instance, that some guessing functions take arguments which you will need to consider when writing your code.

Jun 18 '22 13:06 jbarnoud

mdanalysis mdanalysis copied to clipboard

Implement context aware guesser

Is your feature request related to a problem?

Describe the solution you'd like

mdanalysis
mdanalysis copied to clipboard