Improvements to `species`
Create a species type thing that stores the indices and charges of things (charges only for conductivity, obviously).
This is a protype that might work:
species = sc.DataGroup({'jeff': sc.DataArray(data=sc.array(values=[[0, 1], [2, 3], [4, 5]], dims=['particles', 'atom']), coords={'charges': sc.ones(dims=['particles'], shape=(3,))}),
'amy': sc.DataArray(data=sc.array(values=[[6, 7, 8], [9, 10, 11], [12, 13, 14]], dims=['particles', 'atom']), coords={'charges': sc.ones(dims=['particles'], shape=(3,)) * 2})})
If the user passes a string as the specie, kinisi should build this sort of object and work with it.
This would lead to a 2.1 release.
Waiting on #185
Breaking down:
- [ ] Species class
- [ ] Data group for different species
- [ ] refactor of internals to do the center of mass
Few considerations that came to my mind.
- we could call
SpecieaParticleGroup? - how to treat e.g. dynamic charges that come from the trajectory data?
Issue with data availability.
We were thinking about these
Species.from_type("Li") # Lookup from database
Species.from_indices([0,1,2], ...) # Manual specification
Species.from_data_array(...) # Already-structured data
but the issue is, the internal state of these objects would differ a lot, because for the from_type we can't get to the indices without going through the data, so we would have to store all these internals in some obfuscated way. An alternative could be, to have different classes
class ParticleGroup: ...
class AtomicSpecies:
element: str
charge: float|None
class MolecularSpecies:
indices: list[int]
charges: float | list[float] | None
and internally convert them via AtomicSpecies(...).to_particle_group(data: ...) -> ParticleGroup where data would be some trajectory data, where we can get the indices from for AtomicSpecies.
For the actual configuration setup, we would still have a union type being particle_groups = {"Li": AtomicSpecies("Li"), "CO32-": MolecularSpecies(indices=[[1, 2, 3, 4], ...], charges=-2), "H2O": ParticleGroup(...). where ParticleGroup could / should be scipp.
or written out
config = {
"particle_groups": {
"Li": AtomicSpecies("Li", charge=1.0),
"CO3": MolecularSpecies(
indices=[[1, 2, 3, 4], [5, 6, 7, 8]], # 2 molecules
masses=[12.01, 16.0, 16.0, 16.0],
charges=[-2.0, -0.8, -0.6, -0.6] # Per-atom
),
"H2O": MolecularSpecies(
indices=[[9, 10, 11], [12, 13, 14]],
masses=[16.0, 1.008, 1.008],
charges=None
),
}
}
I had it in my head that this:
Species.from_type("Li") # Lookup from database
Species.from_indices([0,1,2], ...) # Manual specification
Species.from_data_array(...) # Already-structured data
would be internal. So, if the user provided a string, kinisi would do the lookup (which is how it works currently). What is the reason not to use that approach?
So, if the user provided a string, kinisi would do the lookup (which is how it works currently). What is the reason not to use that approach?
How would that look like in the input dict?
The same as it is at the moment, but species_indices would go away and one could pass the data array directly as species. This might be different from what @jd15489 had in mind though.
Could you give a dict example?
From
molecules = [[288, 289, 290, 291, 292, 293],
[284, 295, 296, 297, 298, 299]]
params = {
'specie': None,
'specie_indices': sc.array(dims=['particle', 'atoms in particle'], values=molecules, unit=sc.Unit('dimensionless')),
...
}
to
molecules = [[288, 289, 290, 291, 292, 293],
[284, 295, 296, 297, 298, 299]]
params = {
"particle_groups": {"mol": sc.array(dims=['particle', 'atoms in particle'], values=molecules, unit=sc.Unit('dimensionless'))}
...
}
OR
molecules = [[288, 289, 290, 291, 292, 293],
[284, 295, 296, 297, 298, 299]]
params = {
"particle_groups": {"mol": "Li"} # should the Union str|sc.array be used to infer the type? Have it implict instead of explict.
...
}
How to provide the charges? Should "Li" also be a sc.array ?
Small note on the suggested API above, the charge and mass should be sc.scaler or sc.array respectively
AtomicSpecies("Li", charge=sc.scalar(1, unit='charge'))
MolecularSpecies(..., charges=sc.array(...), masses = sc.array(...)
Having classes like AtomicSpecies could allow e.g. to introduce further logic, which the union type won't be able to.
e.g. MoleculeSpeciesFromSmiles(smiles="CCO") to compute the diffusion for all molecules that match the CCO smiles, i.e. ethanol.
Ahhh, I wasn't thinking about charges. Sorry, not enough sleep evidently. Let me think about it this week.
I have been thinking about this.
My thought is that we should consolidate these arguments to Parser: [coords, specie_indices, drift_indices, & masses]
We instead pass two lists of ParticleGroup objects, one for diffusion species and one for drift species.
We then change the Parser subclasses to handle these ParticleGroup objects.
For example, MDAnalysisParser should have methods for creating ParticleGroup objects from the universe and the user's input. This subParser would also accept ParticleGroup objects that don't contain coords, and pass them through adding coords along the way.
This might be a substantial rewrite, but I think it adds flexibility and reduces the number of objects we are having to manual pass between classes.
Some notes from the above implementation (SubParser is the inherited Parser, i.e., MDAnalysisParser) from a whiteboard conversation between @jd15489 and I.
Possible input types are:
-
str -
list(backwards compatibility forspecies_indices(both this list input and thespecies_indiceskeyword would be immediately deprecated for removal in a future point release. - new
ParticleGroupwhich is a superclass ofsc.DataGroup
In the SubParser, these are constructed into a ParticleGroup; the SubParser has access to the trajectory, so this is fine. This approach would mean that a ParticleGroup could be constructed with a NumPy array as the coords, which is an outstanding issue.
This ParticleGroup should be a sc.DataArray, and then if there are two molecule types of different lengths of atoms, these will be two ParticleGroups stored together in a sc.DataGroup.