AtomsBase.jl icon indicating copy to clipboard operation
AtomsBase.jl copied to clipboard

Interface Update v0.4.x

Open cortner opened this issue 1 year ago • 35 comments
trafficstars

The following document summarizes a proposed interface update for 0.4.x. Please provide comments by 8 July, after which I will start a new PR that implements this. If you plan to provide comments but after 8 July just let me know here please.

https://hackmd.io/@TQNVbm-8R6mxyDAnh7mzAA/HJgCF-xP0

Short summary

  • Rework boundary conditions interface, introduce a cell type, remove all usage of infinities. #99, #97
  • Introduce a "chemical element" / "atom id" type that is isbits and therefore computationally efficient but sufficiently flexible for most use cases; #49
  • Slim down the core interface, in particular make dynamic getters such as system[:position, i] part of an extended interface.
  • Introduce setters as a second extension to the interface.
  • Move the reference implementations to an unexported submodule.
  • #100, #28, #43, #78, #5

EDIT - just to be clear - we are only talking about 0.4.x. Further evolutions are likely. This need not be the final iteration.

cortner avatar Jul 01 '24 12:07 cortner

My sense is that the MAIN DECISION to be made is whether the reference implementation should become the de facto standard to be used by this community or just an example implementation that nobody should use. The current proposal is to demote it to an example implementation. Hearing opinions on this point would be particularly welcome - in favour or against either direction.

cortner avatar Jul 01 '24 12:07 cortner

I'll start by saying (again, but more publicly) a big THANK YOU 🙏 to @cortner for spearheading this!

As I've said privately, I'm generally in agreement with everything and have no major comments, certainly nothing blocking. Regarding the example vs. standard, I've started an attempt to organize some thoughts below (if others chime in I can attempt to summarize other thoughts in edits, too). Personally, I lean towards the "example implementation" option; however, I also think it's unlikely we'll be able to fully "enforce" either option (nor should we) and the likely steady-state will have components of both...

In favor of "de facto standard" / against "example implementation"

  • likely maximally simplifies interoperability
  • this approach will be familiar to folks coming from e.g. ASE

In favor of "example implementation" / against "de facto standard"

  • it is likely (inevitable?) that for reasons of performance/convenience, packages will utilize different internal representations of structure anyway; in a sense the whole point of a Julian interface is to enable this while still smoothing out interoperability challenges
  • (related to above point) this way probably makes it easier for packages to adopt AtomsBase support later on in their development

rkurchin avatar Jul 01 '24 13:07 rkurchin

Thank you for this. These updates looks good.

In the document, I was confused about a few things:

  • What does particle_id return? Is it supposed to return a Type like ChemicalElement or something else? While explaining this, another function named particle_type is being used which seems to return the particle type. Was this a typo? Is particle_type and particle_id the same?
  • atomic_number, atomic_symbol, atomic_mass are still part of the primary interface right?

rashidrafeek avatar Jul 01 '24 15:07 rashidrafeek

I saw that a plural form for each setter function is added. Wouldn't defining singular as well as plural form of interface functions lead to numerous functions in the namespace? There seems to be plural forms defined for the primary interface functions position, mass and particle_id as positions, masses and particle_ids. Shoudn't there also be, atomic_numbers, atomic_symbols etc. If the interface is extended later on, with charge, velocity, momentum etc., again plural forms for these would need to be added. Wouldn't this lead unnecessary pollution of namespace.

Isn't it better to just use the singular form for the system as well, similar to the current definition? Just a suggestion.

rashidrafeek avatar Jul 01 '24 15:07 rashidrafeek

Thanks for pointing out inconsistencies. I'll clear those up asap and then respond to the other point.

cortner avatar Jul 01 '24 18:07 cortner

  • What does particle_id return? Is it supposed to return a Type like ChemicalElement or something else?

yes. ChemicalElement is the strong recommendation, but it could be something else and this is up to the implementer.

While explaining this, another function named particle_type is being used which seems to return the particle type. Was this a typo? Is particle_type and particle_id the same?

I changed everything to _id. There was a discussion about these two names and I couldn't find it. I have a vague memory we may have said that _type interferes with the technical type in the Julia language hence we switched to _id. @mfherbst - I remember you proposed a change of name, can you remember?

  • atomic_number, atomic_symbol, atomic_mass are still part of the primary interface right?

yes. I propose to provide fallback implementations, as described in the document, with functions such as

atomic_number(...) = atomic_number(particle_id(...))
atomic_symbol(...) = atomic_symbol(particle_id(...))
atomic_mass(...) = mass(...)

atomic_mass could be deprecated (I have no view on this), the other two should probably stay.

cortner avatar Jul 02 '24 06:07 cortner

I saw that a plural form for each setter function is added. Wouldn't defining singular as well as plural form of interface functions lead to numerous functions in the namespace? There seems to be plural forms defined for the primary interface functions position, mass and particle_id as positions, masses and particle_ids. Shoudn't there also be, atomic_numbers, atomic_symbols etc. If the interface is extended later on, with charge, velocity, momentum etc., again plural forms for these would need to be added. Wouldn't this lead unnecessary pollution of namespace.

yes, but personally I don't see why this should be a problem. If I'm missing something please do point it out.

Isn't it better to just use the singular form for the system as well, similar to the current definition? Just a suggestion.

I personally find it very hard to read position(sys) and intuit that I am getting a list of positions rather than a single position. That said, this is a point where I feel I worked to convince others to agree with me rather than there was a general desire to change this.

I stand by my point and will continue to argue against allowing position(sys). However, I would also be happy with

position.(sys)
position(sys, :)     #  <=  this is maybe my favourite now.

Another advantage of positions, velocities etc (as Joe Greener pointed out) is that this makes the list of positions a system property but a single position a particle property.

For avoidance of any doubt: It is 100% ok to continue to debate this before making a final decision.

cortner avatar Jul 02 '24 07:07 cortner

I saw that a plural form for each setter function is added. Wouldn't defining singular as well as plural form of interface functions lead to numerous functions in the namespace? There seems to be plural forms defined for the primary interface functions position, mass and particle_id as positions, masses and particle_ids. Shoudn't there also be, atomic_numbers, atomic_symbols etc. If the interface is extended later on, with charge, velocity, momentum etc., again plural forms for these would need to be added. Wouldn't this lead unnecessary pollution of namespace.

yes, but personally I don't see why this should be a problem. If I'm missing something please do point it out.

The main problem that I feel is it doesn't feel natural while coding (maybe not a concrete reason). For example, I frequently interchange between using a subset of positions (currently position(sys, indices)) or all positions (currently position(sys)). Requiring to add an additional s just for the second case doesn't seem natural to me. Anyway, I am okay with either of the options.

I personally find it very hard to read position(sys) and intuit that I am getting a list of positions rather than a single position.

Can't the position be thought of as a single point in the configuration space, or for an arbitrary property, a single point in the N-dimensional space of whichever property we are looking at :)

rashidrafeek avatar Jul 02 '24 16:07 rashidrafeek

So it just seems to come down to different people having different mental pathways. That's always unfortunate and makes it difficult to decide. Maybe this change needs to be postponed.

I would love to hear more opinions from others.

cortner avatar Jul 02 '24 17:07 cortner

I had it in my head that position would always return one position and positions multiple, which would give a nice justification of different functions giving different return types. However as Rashid points out the position(sys, indices) case also returns multiple positions, so I can see some confusion arising if we introduce the plural case just for positions(sys).

The only firm opinion I have is that position.(sys) is probably not the way to go, as it makes it hard to dispatch with a custom system type.

jgreener64 avatar Jul 08 '24 12:07 jgreener64

Thanks everyone for the comments so far. As I start working on the PR we can still discuss. For now I won't make the singular/plural change yet.

cortner avatar Jul 08 '24 12:07 cortner

Let me start by also issuing a big thank you to @cortner. I think as part of this discussion my general idea of what AtomsBase is for has already sharpened a lot.

I am also in general agreement with the proposal modulo some of the details pointed out for discussion in the md document and above.

Example versus reference implementation

I think we as an ecosystem will long-term need a standard implementation to point people to when they want to expose their data to AtomsBase via conversion. To give one concrete use case example: in AtomsIO I have multiple use cases where the internal data representation from a package is not compatible with AtomsBase and the only way to integrate it with AtomsBase is a convert(AbstractSystem, data). However, I don't think it needs to be decided now which implementation this should be and it does not need to be the implementation in this package. Rather I think of the implementations here as references for others to get inspiration and test their code against.

Change of name particle_id

I remember you proposed a change of name, can you remember?

I think type sounds more physics-y than _id, that's probably why I proposed _type over _id. But honestly I don't have a strong opinion.

atomic_mass versus mass

I vote to deprecate atomic_mass.

singular versus plural

Regarding the different mental pictures between singular and plural. Maybe the compromise here is to make things explicit. In that sense I like the idea of only allowing position(sys, :) over position(sys) and positions(sys). If we feel this is too explicit it is always easier to add sugar than to remove it later (because people will start using all variants).

mfherbst avatar Jul 08 '24 14:07 mfherbst

long-term need a standard implementation ... I don't think it needs to be decided now which implementation this should be

I agree with this.

Change of name particle_id

I'll add a poll below.

singular versus plural

based on discussion so far I think what you write may be the best way forward. I will add another poll.

cortner avatar Jul 08 '24 15:07 cortner

POLL: particle_id vs particle_type to return the category (e.g. chemical element) of a particle.

particle_id : 👍 particle_type : 👎

cortner avatar Jul 08 '24 15:07 cortner

POLL: do you agree to deprecate atomic_mass (👍 vs 👎 )

cortner avatar Jul 08 '24 15:07 cortner

POLL: do you agree to

  • deprecate position(sys) and to enforce the use of position(sys, :)
  • not introduce positions(sys)

This this motion is defeated then this discussion will be postponed to v0.5.x.

cortner avatar Jul 08 '24 15:07 cortner

After using LAMMPS a lot, particle_id is a bit confusing for me because I would actually expect to return an integer, since that's how atom_id's are used in LAMMPS. So I prefer particle_type. There's a risk that someone might expect a DataType to be returned when using particle_type, but as long as the documentation is clear, this shouldn't be an issue.

As for atomic_mass I think it's fine to deprecate it in favor of mass for better generalization. However, I still think there should be a default fallback for ChemicalElement's, i.e. atomic_mass(p::ChemicalElement) = mass(p).

swyant avatar Jul 08 '24 18:07 swyant

For the position proposal, why not a slight variant to what you propose where:

  • deprecate position(sys) so that position always requires two arguments, i.e. position(sys,i) or positions(sys,:)
  • A default fallback positions(sys) = position(sys,:) that allows for a simple interface to the system-level property of all positions of a given system.

swyant avatar Jul 08 '24 18:07 swyant

Re I'd vs type. We don't have to rush this. Is there a better terminology? I think both are problematic with type maybe Margo ally preferrable?

  • species
  • category

?

cortner avatar Jul 08 '24 18:07 cortner

For the position proposal, why not a slight variant to what you propose where:

  • deprecate position(sys) so that position always requires two arguments, i.e. position(sys,i) or positions(sys,:)
  • A default fallback positions(sys) = position(sys,:) that allows for a simple interface to the system-level property of all positions of a given system.

Thanks for pointing this out. It is actually exactly what I wanted, but you are right I didn't highlight that positions is now just sugar. A few points made above suggest it isn't entirely obvious what the right approach is. There is some value in enforcing a uniform style.

cortner avatar Jul 08 '24 19:07 cortner

Is there a better terminology?

@niklasschmitz also suggested "particle_kind" as an option. Sounds like type but rings different bells for CS people.

There is some value in enforcing a uniform style.

Agree !

mfherbst avatar Jul 08 '24 21:07 mfherbst

Regarding id, type, kind, category, species: I think the term we choose has to capture the distinction between types of atoms rather than the distinctions between - say - atoms, electrons, etc.

cortner avatar Jul 09 '24 13:07 cortner

Re I'd vs type. We don't have to rush this. Is there a better terminology? I think both are problematic with type maybe Margo ally preferrable?

  • species
  • category

Doesn't species capture the functionality better? Currently, we have species_type which returns the type of atom/particle.

rashidrafeek avatar Jul 10 '24 06:07 rashidrafeek

I like species; one doesn't even need particle_species or species_type; just species by itself should be fine?

cortner avatar Jul 10 '24 06:07 cortner

For the chemists here : how does species or particle_species work in relation to the concept of "chemical species" (cf. discussions about how to define chemical element in a way that incorporates isotopes and various other things ....)

cortner avatar Jul 10 '24 08:07 cortner

how does species or particle_species work in relation to the concept of "chemical species"

I think for the standard use case it works fine. I think only for coarse graining and similar it can create conceptional clashes.

@jgreener64 might have some insight on the potential language in this case.

mfherbst avatar Jul 10 '24 09:07 mfherbst

I would personally be quite comfortable with "species" for a coarse-grained particle. But would be good to hear more views.

cortner avatar Jul 10 '24 13:07 cortner

species sounds okay to me, as does type. I agree that id might imply an integer index. Avoiding atom and chemical in function names would be sensible if we want to indicate that particles don't have to be atoms (for situations like coarse graining).

jgreener64 avatar Jul 10 '24 14:07 jgreener64

For the chemists here : how does species or particle_species work in relation to the concept of "chemical species" (cf. discussions about how to define chemical element in a way that incorporates isotopes and various other things ....)

It is a bit problematic. Chemistry definition can lead to confusion, as chemical species can mean more than just "atom type". So, I would prever particle_type over it.

tjjarvinen avatar Jul 11 '24 21:07 tjjarvinen

I don't see the issue - sounds consistent to me to use the term species here.

EDIT: @tjjarvinen -- would be great if you can substantiate your concern. I can then poll people to decide.

cortner avatar Jul 11 '24 22:07 cortner