AtomsBase.jl
AtomsBase.jl copied to clipboard
Supporting "arbitrary" properties
This has come up a couple of times now. Examples could be:
- a charge associated with every atom (atom-wise property, scalar)
- a dipole moment associated with a system (system-level property, vector)
- ...and plenty of others.
@mfherbst, I recall a discussion of a relatively flexible/generic way to do this at some point but now can't find it, or recall the details of the proposal, but since this came up again recently over at https://github.com/FermiQC/Fermi.jl/issues/120#issuecomment-1079655811, I figured I'd file an issue here for discussion.
Atom-level properties already work (see Atom struct). Basically just overload getproperty and hasproperty. IMHO there is nothing additional needed for that. Same thing could be done with an identical mechanism for the system. Again no need to define an extra interface as Base already provides this.
I think the only thing we have to do in AtomsBase is to agree (and document) the names of the standard keys and their expected size / type of the returned fields.
In light of what I just said, let's brainstorm a little on what people would need / are already using (feel free to amend this post):
System
- ...
Atom
🤦♀️ that's why I couldn't find the atom-level one in an issue, because we already did it. Haha, my bad!
This approach sounds great. I suppose charge and dipole moment could be system-level for molecules? As far as atom-level properties, isotopic information might be useful. I'm not sure if we want to throw in everything we can imagine now, though, or do things more sparingly on an as-needed basis...
I think we should first collect the things someone has an actual use case for to not avoid clutter
A property which might be relevant would be the "Topology", which stores the bonding information of the system. In many of the formats used for classical MD calculations this information is also present in the trajectory files.
For example, The Frame type in Chemfiles.jl which is the analogue of an AbstractSystem, stores Topology in addition to positions, velocities and the unitcell.
Here are some examples of system level properties I've used recently:
- Total energy
- Kinetic energy
- Potential energy
- Free energy (of ion-electron system)
- Charge
- Dipole moment
- Polarisability
- Stress tensor
- Strain tensor
I think it will be very hard to anticipate what users would like to store - for example I'd often want to store multiple energies coming from DFT and a variety of more approximate models - so I'd advocate for making it as flexible as possible, but I agree there's nothing wrong with identifying and documenting some standard properties.
See my implementation of particle States in ACE.jl - that's exactly what it provides. It could potentially move here, or it could inspire a new and better implementation here.
Coming late to the discussion, I have just posted a comment on the Zulip chat about something related.
I think both the for System or for Atom, we could have the additional data represented as a typed generic field:
struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass, ExtraData}
position::SVector{D, L}
velocity::SVector{D, V}
atomic_symbol::Symbol
atomic_number::Int
atomic_mass::M
data::ExtraData
end
We can let the current interface for when ExtraData <: Dict, but with that we can also include type-stable data carrieres for anything the user wants. With the important advantage of being able to overload functions for custom Atom types, such as, for example, show. In my case, I want several additional fields (to represent all ATOM data of a PDB file, and want to print the atoms as:
julia> pdb = PDBTools.wget("1LBD")
julia> pdb
Array{Atoms,1} with 1870 atoms with fields:
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N SER A 225 1 45.228 84.358 70.638 67.05 1.00 1 - 1
2 CA SER A 225 1 46.080 83.165 70.327 68.73 1.00 1 - 2
3 C SER A 225 1 45.257 81.872 70.236 67.90 1.00 1 - 3
⋮
1868 OG1 THR A 462 238 -27.462 74.325 48.885 79.98 1.00 1 - 1868
1869 CG2 THR A 462 238 -27.063 71.965 49.222 78.62 1.00 1 - 1869
1870 OXT THR A 462 238 -25.379 71.816 51.613 84.35 1.00 1 - 1870
as for now I need to define a custom Atom structure to be able to overload show, but if I had a parametric type I could just extend AtomsBase.Atom.
I think that would not be breaking at all, and if people feel that is a good idea, I can try to implement it.
This is in my opinnion a good suggestion and I can see several cases where it would be useful.
In addition to this, it might be a good idea to change the type of atomic_symbol to a parameter. Because that would allow making the Atom type a bitstype, when using e.g. SVector{2,Char}. But this could have some other issues, so I am a bit unsure of it.
For exactly that reason we used a wrapped integer in JuLIP but allowed easy translation to a Symbol. With a bit of work the display could even show the symbol instead of the atomic number.
the type of
atomic_symbolto a parameter.
I would follow what Cortner suggested, just store the atomic number and make the symbol, or string atomic name, something to be get on the flight.
(although this is a separate issue, and would be breaking, better treat it separately)
The only issue with the last suggestion is that it requires a one to one mapping between species and atomic number, which is maybe ok for quantum mechanics but not for classical MD where there is often more than one type of each atom (even in QM there could be different pseudo potentials)
That right, one more reason to allow a typed extra data field. Then the atomic type information can be stored there in any format needed, and there are many formats possible (Ints, or strings, etc).
For instance someone could implement Atom{D,L,M,OPLSAA}, or Atom{D,L,M,CHARMM}, etc.
Then I'll promote again my implementation in ACE.jl of wrapped NamedTuples.
https://github.com/ACEsuit/ACE.jl/blob/main/src/states.jl
https://github.com/ACEsuit/ACE.jl/blob/main/test/test_states.jl
I did this in ExtXYZ.jl already. Could be moved here.
https://github.com/libAtoms/ExtXYZ.jl/blob/master/src/atoms.jl
The issue I see with that approach is that it does not enforce anything. It is almost as saying that the interface is just a commitment to the implementation of some functions. IMO the interface is useful if at all agree with at least some properties to be always present.
(I would say positions and atomic number - velocities maybe, but mass, symbol, name, are either redundant with the atomic number or have to be extended arbitrarily for new types).
But if all agree that the interface is only related to the implementation of a subtype of AbstractAtom and some getter and setter functions, there isn't need for this discussion, each package could implement the structure at will. In that case, the base interface should not depend on any type parameter or internal field of the Atom struct.
I thought we are talking about a prototype implementation. Indeed I don't think the interface can depend on any type parameters at all. Just on getter functions.
I guess if you want an abstract CHARMM then just implement a new function isCHARMM which defaults to false?
But what interoperability could we get from that interface, at all? That's what I miss.
Why then Atom here is exported at all?
But what interoperability could we get from that interface, at all?
If the getters are defined and documented in a public interface, that's all you need for interoperability I think? Is there something concrete that you can't do?
Why then Atom here is exported at all?
I think it is intended as a minimal prototype.
I wouldn't be against helping implement another publicly share concrete type say FlexibleAtom or something similar that has more functionality. The standard functionality could be the same as for Atom and hence guaranteed, but if could have arbitrary additional typed field. E.g. in a wrapped NamedTuple, which would also give you the type information you want.
But I wouldn't necessarily want to rely on that concrete implementation, and rather have documented getters that extract the information.
I think it is intended as a minimal prototype.
Ok, so from this point of view the Atom struct is a minimal prototype. Then, a few questions:
Is the Atom constructor part of the mandatory interface? The docs seem to suggest that.
If the Atom struct is not part of the expected interface, then IMO it should not be exported. And should be better provided as an example. In any case, the present discussion is then of course unnecessary, as arbitrary properties are already accepted by any implementation of a struct for which the list of getters is defined. Is that the correct way to interpret this? The use or not of a symbol to represent the atomic name is also irrelevant as is only the choice for the prototype exemplified.
In this case, wouldn't be sensible to have an AbstractAtom type, to at least carry the information that one expects that an atomic type defined in an independent package is a subtype of AtomsBase.AbstractAtom and, as such, is expected to satisfy the interface?
What you suggest makes sense in Java but not in Julia. Abstract super types are not needed to define an interface but are only needed to share methods. At the moment it is not clear what would be shared. Maybe in the future this will change.
What I can imagine is something like:
julia> abstract type AbstractAtom end
julia> charge(::AbstractAtom) = "not implemented"
charge (generic function with 1 method)
julia> atomic_number(::AbstractAtom) = "not implemented"
atomic_number (generic function with 1 method)
julia> struct Atom <: AbstractAtom
atomic_number::Int
charge::Float64
end
julia> a = Atom(1,1.0)
Atom(1, 1.0)
julia> charge(a)
"not implemented"
such that methods not implemented (but possibly expected) can be documented and provide sensible responses.
Why do you need this? The compiler will tell you that it isn't implemented.
But even if you insist you can just use ::Any
Well, it is a way to inform what the interface expects. And methods implemented across packages can be type annotated on the abstract type to inform that the function in case is expected to follow the interface.
I'm fine not using, but then I don't see even why a package would have to depend on AtomsBase at all.
And, back to the topic, I don't see the point of this issue in particular. The fact that this issue is open makes me believe that it is not clear to everyone what to expect from the interface.
We need to depend on AtomsBase to share the functions that we use to access information stored in Atoms or Structures ie systems of particles.
The interface is defined via documentation.
Do people here agree, at least, that this issue of arbitrary properties is not really fundamental, at least? Meaning, that if one implements these 6 getters:
https://juliamolsim.github.io/AtomsBase.jl/stable/apireference/#Species-/-atom-properties
the interface of the atomic properties is followed as expected?
Seems to me that there is a confusion in what is the interface expected, and what is the Atom structure which is defined as an example of an implementation following the interface, which stores arbitrary properties in a dict, but that's totally secondary.
In any case I don´t think the documentation is clear.
edit: no, I'm not sure about that. Those functions are defined as getters from an abstract system, not an atom. That's confusing. We want an interface to individual atoms or not? Is we do, the natural way (IMO) would be to have an AbstractAtom and the expected functions defined for it, as it is done for AbstractSystem. What is implemented as a concrete type of Atom, if is part of the expected interface, is very limiting (current issue among others), if it is not, it should be placed, IMO, separately, in a example package (or very clearly stated as an example).
-
Maybe the documentation can be improved.
-
Atomsis not part of the expected interface as far as I understand, but an implementation detail? I could be wrong.
I've said this many times in many discussions and I'll say it again : in the absence of multiple inheritence (but to some extent even with) any sort of AbstractSomethingOrOther limits you severely. Abstract supertypes are only needed to share methods, but it is not needed to document an interface.
👆 dropped in to say exactly this – we talked about having something like AbstractAtom a lot early on (as well as several other abstract types), but that ends up putting severe constraints on things because you can't inherit from multiple abstract types in Julia (honestly, I've had some second thoughts about even just having AbstractSystem, since it's already created a few challenges in implementation...)