AtomsBase.jl Supporting "arbitrary" properties

trafficstars

This has come up a couple of times now. Examples could be:

a charge associated with every atom (atom-wise property, scalar)
a dipole moment associated with a system (system-level property, vector)
...and plenty of others.

@mfherbst, I recall a discussion of a relatively flexible/generic way to do this at some point but now can't find it, or recall the details of the proposal, but since this came up again recently over at https://github.com/FermiQC/Fermi.jl/issues/120#issuecomment-1079655811, I figured I'd file an issue here for discussion.

Mar 26 '22 18:03 rkurchin

Atom-level properties already work (see Atom struct). Basically just overload getproperty and hasproperty. IMHO there is nothing additional needed for that. Same thing could be done with an identical mechanism for the system. Again no need to define an extra interface as Base already provides this.

I think the only thing we have to do in AtomsBase is to agree (and document) the names of the standard keys and their expected size / type of the returned fields.

Mar 26 '22 18:03 mfherbst

In light of what I just said, let's brainstorm a little on what people would need / are already using (feel free to amend this post):

System

...

Atom

magnetic_moment: Number or vector of three floats (used in DFTK)
pseudopotential: String, identifier or file name of a pseudopotential file (used in DFTK)

Mar 26 '22 18:03 mfherbst

🤦‍♀️ that's why I couldn't find the atom-level one in an issue, because we already did it. Haha, my bad!

This approach sounds great. I suppose charge and dipole moment could be system-level for molecules? As far as atom-level properties, isotopic information might be useful. I'm not sure if we want to throw in everything we can imagine now, though, or do things more sparingly on an as-needed basis...

Mar 28 '22 15:03 rkurchin

I think we should first collect the things someone has an actual use case for to not avoid clutter

Mar 28 '22 16:03 mfherbst

A property which might be relevant would be the "Topology", which stores the bonding information of the system. In many of the formats used for classical MD calculations this information is also present in the trajectory files.

For example, The Frame type in Chemfiles.jl which is the analogue of an AbstractSystem, stores Topology in addition to positions, velocities and the unitcell.

Mar 29 '22 06:03 rashidrafeek

Here are some examples of system level properties I've used recently:

Total energy
Kinetic energy
Potential energy
Free energy (of ion-electron system)
Charge
Dipole moment
Polarisability
Stress tensor
Strain tensor

I think it will be very hard to anticipate what users would like to store - for example I'd often want to store multiple energies coming from DFT and a variety of more approximate models - so I'd advocate for making it as flexible as possible, but I agree there's nothing wrong with identifying and documenting some standard properties.

Mar 31 '22 08:03 jameskermode

See my implementation of particle States in ACE.jl - that's exactly what it provides. It could potentially move here, or it could inspire a new and better implementation here.

Aug 04 '22 19:08 cortner

Coming late to the discussion, I have just posted a comment on the Zulip chat about something related.

I think both the for System or for Atom, we could have the additional data represented as a typed generic field:

struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass, ExtraData}
    position::SVector{D, L}
    velocity::SVector{D, V}
    atomic_symbol::Symbol
    atomic_number::Int
    atomic_mass::M
    data::ExtraData
end

We can let the current interface for when ExtraData <: Dict, but with that we can also include type-stable data carrieres for anything the user wants. With the important advantage of being able to overload functions for custom Atom types, such as, for example, show. In my case, I want several additional fields (to represent all ATOM data of a PDB file, and want to print the atoms as:

julia> pdb = PDBTools.wget("1LBD")

julia> pdb
   Array{Atoms,1} with 1870 atoms with fields:
   index name resname chain   resnum  residue        x        y        z  beta occup model segname index_pdb
       1    N     SER     A      225        1   45.228   84.358   70.638 67.05  1.00     1       -         1
       2   CA     SER     A      225        1   46.080   83.165   70.327 68.73  1.00     1       -         2
       3    C     SER     A      225        1   45.257   81.872   70.236 67.90  1.00     1       -         3
                                                       ⋮
    1868  OG1     THR     A      462      238  -27.462   74.325   48.885 79.98  1.00     1       -      1868
    1869  CG2     THR     A      462      238  -27.063   71.965   49.222 78.62  1.00     1       -      1869
    1870  OXT     THR     A      462      238  -25.379   71.816   51.613 84.35  1.00     1       -      1870

as for now I need to define a custom Atom structure to be able to overload show, but if I had a parametric type I could just extend AtomsBase.Atom.

I think that would not be breaking at all, and if people feel that is a good idea, I can try to implement it.

Apr 28 '23 19:04 lmiq

This is in my opinnion a good suggestion and I can see several cases where it would be useful.

In addition to this, it might be a good idea to change the type of atomic_symbol to a parameter. Because that would allow making the Atom type a bitstype, when using e.g. SVector{2,Char}. But this could have some other issues, so I am a bit unsure of it.

Apr 29 '23 13:04 tjjarvinen

For exactly that reason we used a wrapped integer in JuLIP but allowed easy translation to a Symbol. With a bit of work the display could even show the symbol instead of the atomic number.

Apr 29 '23 14:04 cortner

the type of atomic_symbol to a parameter.

I would follow what Cortner suggested, just store the atomic number and make the symbol, or string atomic name, something to be get on the flight.

(although this is a separate issue, and would be breaking, better treat it separately)

Apr 30 '23 11:04 lmiq

The only issue with the last suggestion is that it requires a one to one mapping between species and atomic number, which is maybe ok for quantum mechanics but not for classical MD where there is often more than one type of each atom (even in QM there could be different pseudo potentials)

Apr 30 '23 11:04 jameskermode

That right, one more reason to allow a typed extra data field. Then the atomic type information can be stored there in any format needed, and there are many formats possible (Ints, or strings, etc).

For instance someone could implement Atom{D,L,M,OPLSAA}, or Atom{D,L,M,CHARMM}, etc.

Apr 30 '23 11:04 lmiq

Then I'll promote again my implementation in ACE.jl of wrapped NamedTuples.

https://github.com/ACEsuit/ACE.jl/blob/main/src/states.jl

https://github.com/ACEsuit/ACE.jl/blob/main/test/test_states.jl

Apr 30 '23 14:04 cortner

I did this in ExtXYZ.jl already. Could be moved here.

https://github.com/libAtoms/ExtXYZ.jl/blob/master/src/atoms.jl

Apr 30 '23 14:04 jameskermode

The issue I see with that approach is that it does not enforce anything. It is almost as saying that the interface is just a commitment to the implementation of some functions. IMO the interface is useful if at all agree with at least some properties to be always present.

(I would say positions and atomic number - velocities maybe, but mass, symbol, name, are either redundant with the atomic number or have to be extended arbitrarily for new types).

But if all agree that the interface is only related to the implementation of a subtype of AbstractAtom and some getter and setter functions, there isn't need for this discussion, each package could implement the structure at will. In that case, the base interface should not depend on any type parameter or internal field of the Atom struct.

Apr 30 '23 14:04 lmiq

I thought we are talking about a prototype implementation. Indeed I don't think the interface can depend on any type parameters at all. Just on getter functions.

Apr 30 '23 15:04 cortner

I guess if you want an abstract CHARMM then just implement a new function isCHARMM which defaults to false?

Apr 30 '23 16:04 cortner

But what interoperability could we get from that interface, at all? That's what I miss.

Why then Atom here is exported at all?

Apr 30 '23 17:04 lmiq

But what interoperability could we get from that interface, at all?

If the getters are defined and documented in a public interface, that's all you need for interoperability I think? Is there something concrete that you can't do?

Why then Atom here is exported at all?

I think it is intended as a minimal prototype.

I wouldn't be against helping implement another publicly share concrete type say FlexibleAtom or something similar that has more functionality. The standard functionality could be the same as for Atom and hence guaranteed, but if could have arbitrary additional typed field. E.g. in a wrapped NamedTuple, which would also give you the type information you want.

But I wouldn't necessarily want to rely on that concrete implementation, and rather have documented getters that extract the information.

Apr 30 '23 17:04 cortner

I think it is intended as a minimal prototype.

Ok, so from this point of view the Atom struct is a minimal prototype. Then, a few questions:

Is the Atom constructor part of the mandatory interface? The docs seem to suggest that.

If the Atom struct is not part of the expected interface, then IMO it should not be exported. And should be better provided as an example. In any case, the present discussion is then of course unnecessary, as arbitrary properties are already accepted by any implementation of a struct for which the list of getters is defined. Is that the correct way to interpret this? The use or not of a symbol to represent the atomic name is also irrelevant as is only the choice for the prototype exemplified.

In this case, wouldn't be sensible to have an AbstractAtom type, to at least carry the information that one expects that an atomic type defined in an independent package is a subtype of AtomsBase.AbstractAtom and, as such, is expected to satisfy the interface?

May 01 '23 00:05 lmiq

What you suggest makes sense in Java but not in Julia. Abstract super types are not needed to define an interface but are only needed to share methods. At the moment it is not clear what would be shared. Maybe in the future this will change.

May 01 '23 00:05 cortner

What I can imagine is something like:

julia> abstract type AbstractAtom end

julia> charge(::AbstractAtom) = "not implemented"
charge (generic function with 1 method)

julia> atomic_number(::AbstractAtom) = "not implemented"
atomic_number (generic function with 1 method)

julia> struct Atom <: AbstractAtom
           atomic_number::Int
           charge::Float64
       end

julia> a = Atom(1,1.0)
Atom(1, 1.0)

julia> charge(a)
"not implemented"

such that methods not implemented (but possibly expected) can be documented and provide sensible responses.

May 01 '23 00:05 lmiq

Why do you need this? The compiler will tell you that it isn't implemented.

But even if you insist you can just use ::Any

May 01 '23 01:05 cortner

Well, it is a way to inform what the interface expects. And methods implemented across packages can be type annotated on the abstract type to inform that the function in case is expected to follow the interface.

I'm fine not using, but then I don't see even why a package would have to depend on AtomsBase at all.

And, back to the topic, I don't see the point of this issue in particular. The fact that this issue is open makes me believe that it is not clear to everyone what to expect from the interface.

May 01 '23 10:05 lmiq

We need to depend on AtomsBase to share the functions that we use to access information stored in Atoms or Structures ie systems of particles.

The interface is defined via documentation.

May 01 '23 13:05 cortner

Do people here agree, at least, that this issue of arbitrary properties is not really fundamental, at least? Meaning, that if one implements these 6 getters:

https://juliamolsim.github.io/AtomsBase.jl/stable/apireference/#Species-/-atom-properties

the interface of the atomic properties is followed as expected?

Seems to me that there is a confusion in what is the interface expected, and what is the Atom structure which is defined as an example of an implementation following the interface, which stores arbitrary properties in a dict, but that's totally secondary.

In any case I don´t think the documentation is clear.

edit: no, I'm not sure about that. Those functions are defined as getters from an abstract system, not an atom. That's confusing. We want an interface to individual atoms or not? Is we do, the natural way (IMO) would be to have an AbstractAtom and the expected functions defined for it, as it is done for AbstractSystem. What is implemented as a concrete type of Atom, if is part of the expected interface, is very limiting (current issue among others), if it is not, it should be placed, IMO, separately, in a example package (or very clearly stated as an example).

May 01 '23 14:05 lmiq

Maybe the documentation can be improved.
Atoms is not part of the expected interface as far as I understand, but an implementation detail? I could be wrong.

May 01 '23 15:05 cortner

I've said this many times in many discussions and I'll say it again : in the absence of multiple inheritence (but to some extent even with) any sort of AbstractSomethingOrOther limits you severely. Abstract supertypes are only needed to share methods, but it is not needed to document an interface.

May 01 '23 15:05 cortner

👆 dropped in to say exactly this – we talked about having something like AbstractAtom a lot early on (as well as several other abstract types), but that ends up putting severe constraints on things because you can't inherit from multiple abstract types in Julia (honestly, I've had some second thoughts about even just having AbstractSystem, since it's already created a few challenges in implementation...)

May 02 '23 20:05 rkurchin

AtomsBase.jl AtomsBase.jl copied to clipboard

Supporting "arbitrary" properties

System

Atom

AtomsBase.jl
AtomsBase.jl copied to clipboard