chemicaljson
chemicaljson copied to clipboard
Specification not in sync with avogadrolibs implementation
The specification provided for chemical JSON in this repository seems to behind the actually supported implementation in https://github.com/OpenChemistry/avogadrolibs/blob/master/avogadro/io/cjsonformat.cpp.
Where can implementors of chemical JSON find an up-to-date specification or is the implementation in avogadrolibs the source of truth for this format?
Great question - I saw your earlier tag. Partly, this has been due to lack of time. (I'm behind on writing documentation for Avogadro 2 as well. 😄 ) Partly, it's been not knowing the best mechanism to encode the spec (e.g., using something formal like Pydantic.)
A somewhat more readable "ground truth" is in https://github.com/OpenChemistry/avogadro-cclib/blob/main/cclibScript.py which I'm working hard to keep that up-to-date and merge into cclib
as well.
I've been discussing also with the cclib team on other attributes.
A few things that are definitely missing:
- total charge / total spin
- energies for multiple coordinate sets (e.g., conformers, IRC, optimization steps, etc.)
- Right now this is going in
"properties": { "energies": [ … ] }
which makes some sense.
- Right now this is going in
- other kinds of molecular / atomic / bond properties (e.g., electrostatics, tensors, …)
- calculation metadata (IIRC it's in the
avogadro-cclib
script, but not the C++ yet)
Right now, it supports IR / Raman, electronic, and NMR spectra, but only in a limited way because plotting has been somewhat complicated in Avogadro 2.
Suggestions / discussion are definitely welcome.
The format is used by other code, particularly the MongoChem web components, but these are mostly extensions or reverse-engineering the work done for the web components.
I would prefer a spec to implement against instead of another implementation, but I'll work with what is available.
total charge / total spin
Only half of the formats xtb
supports via mctc-lib
have this (Turbomole coords, CTFiles like mol/sdf, QCSchema JSON, QChem molfiles, Gaussian external inputs), this is why we have overwrites via cmd-args or .CHRG
/ .UHF
files, so not a big issue. Of course we loose the information when writing in the same format since xtb
preserves the input format.
using something formal like Pydantic
JSON Schema (either from the spec or as produced by Pydantic) has some annoying limitations, but overall my experience generating schemas and JSON blobs via Pydantic is great, and certainly better than hand-writing JSON.
I would prefer a spec to implement against instead of another implementation, but I'll work with what is available.
In my experience, both a spec, and an implementation and/or live examples are critical, but I understand completely.
I just discovered https://jsontopydantic.com which should help writing a more formal spec and Pydantic model. Given my workload this week and a conference next week, this may take a while, but I'm happy to answer questions in the meantime.
I wonder what is supposed to happen if a chemical JSON such as
{"chemical json": 2, "chemicalJson": 1}
is provided, should the reader reject it or pass it like currently done in Avogadro 2? This behaviour is somewhat inconsistent with
{"chemical json": 1, "chemicalJson": 2}
which is going to fail due to the version number specified.
The specs don't mention that there are alternative keys and how an implementation should deal with duplicated entries.
In general in JSON, non-matching keys are ignored. So a key of "chemical json" is irrelevant, because it's not "chemicalJson".
At the moment, there are not alternative keys and I'm not sure that would be a good idea.
At the moment, there are not alternative keys and I'm not sure that would be a good idea.
This is not true if I check the actual implementation at
https://github.com/OpenChemistry/avogadrolibs/blob/55dec1df36c574a456d18735032b8e29928a28b0/avogadro/io/cjsonformat.cpp#L95-L101
which allows both chemical json
and chemicalJson
as key for the version numbers, same for several other keys.
Dealing with spaces in JSON keys is more cumbersome in JavaScript as well: https://stackoverflow.com/a/4968448/131896
molecule.atoms
vs molecule["vibrational modes"]
This is not true if I check the actual implementation
I think that's due to some unfortunate backwards compatibility. It's a bad idea and I don't recommend it.
As I mentioned, Avogadro now always writes Cartesian coordinates and cell vectors even with periodic systems. (Very old versions would convert to fractional and write parameters .. but you drop useful information.)
Yes, it's why a spec is needed vs. an implementation that tries to ensure backwards compatibility with not-quite compatible files.
Yes, it's why a spec is needed vs. an implementation that tries to ensure backwards compatibility with not-quite compatible files.
This is good to know, since I got the impression that I have the choice as implementor which key I can write, but have to always support both variants when reading. So the camelCase ones are the correct ones? (I should go change this than in my implementation)
My general view on parsers is flexible on read, but correct on write. (This is motivated particularly by "PDB" files. 🙄
Let's also agree on properties => "totalCharge" and properties => "totalSpinMultiplicity" and I promise I'll get that into avogadrolibs
tonight.
Let's also agree on properties => "totalCharge" and properties => "totalSpinMultiplicity" and I promise I'll get that into
avogadrolibs
tonight.
totalCharge
and totalSpinMultiplicity
are fine with me.
My general view on parsers is flexible on read, but correct on write.
I have to disagree on this one, every flexibility allowed in the parser will get used at some point by somebody and is than expected to be supported for all future. Even worse, future implementations are expected to support the same extensions (like this SDF stuff https://github.com/grimme-lab/mctc-lib/issues/49). A parser should enforce the correctness of the format, rigorously.
Thanks for https://github.com/OpenChemistry/avogadrolibs/pull/1058, however this doesn't solve the issue that avogadrolibs implements something else than documented here.
I'm aware - I'm at a conference this week, so my time to write a spec is limited. I will certainly write up a more complete current spec in the next few weeks.
Fortunately, it was a bit easier to push things into Pydantic than I worried. Here's a start, which will turn into a pull request. https://gist.github.com/ghutchis/3a31e5b597dafb15de9ea048cad35268
Yes, some of these examples are really old and need to be removed.