core Electron configuration strings in ADF15 files can be identical for different energy levels

While trying to fix #257 I found a more serious bug.

When creating the repository for PEC and wavelengths from ADF15 files, the electron configuration strings from ADF15 comments are used to form a unique transition key, which then used in the PEC and wavelength repositories. However, in ADF15, different energy levels sometimes have identical electron configuration strings, so the transition keys created from these strings are not unique.

For example, see the comments for the pec96#n_vsu#n0.dat file, which is included in the populate() function. The transitions 5 ("12(2)1( 2.5)- 5(2)1( 2.5)") and 12 ("42(2)1( 2.5)- 5(2)1( 2.5)") will have identical keys because the electron configurations 12 and 42 are described with the same string: 2S2 2P2 3S1 2P2.5. Therefore, the spectral line related to the transition 5 will be missing in the repository.

I don't know if this happens due to an error in production of ADF15 files, but many ADF15 files are affected.

Even though I find using transition string as a key more correct, considering the circumstances we may switch to str(wavelength) as a key.

Jan 27 '21 21:01 vsnever

I propose the following API change to fix this:

Line(element, charge, transition) -> Line(element, charge, identifier). Argument identifier can be a transition tuple or an ADAS wavelength key. The ADAS wavelength key may not be equal to the actual wavelength, because the ADAS wavelengths are not accurate enough in some cases. If the wavelength key cannot be unequivocally identified by the transition then an error is thrown with a list of ADAS wavelength keys which corresponds to the given ADAS transition. Also, if the wavelength key is not found, the error message contains the closest available wavelength and respective transition.
AtomicData.impact_excitation_pec(ion, charge, transition) -> AtomicData.impact_excitation_pec(ion, charge, identifier) (the same for recombination_pec and beam_emission_pec). The behaviour is similar to the Line.
In the openadas.repository, all PECs are stored and accessed by ADAS wavelength keys (converted to nm for consistency with Cherab).
The wavelength repository includes adas_wavelength_key <--> wavelength, adas_wavelength_key -> transition and transition -> [adas_wavelength_key1, adas_wavelength_key2, ...] dictionaries. The adas_wavelength_key and wavelength values are equal by default, but the user can change this (for example for deuterium and tritium).

This is a major change, but I currently don't see any other way to fix this issue.

Update On the top level, the transition parameter is changed with the identifier and that's all. All functional changes are in the openadas module.

Jan 28 '21 16:01 vsnever

This is a very unfortunate bug and the API changes suggested are quite significant. I suggest we discuss this at the next Technical Management Committee meeting to choose a course of action.

Feb 14 '21 16:02 mattngc

This is a very unfortunate bug and the API changes suggested are quite significant. I suggest we discuss this at the next Technical Management Committee meeting to choose a course of action.

I was thinking of a different solution than using ADAS wavelengths as keys instead of transitions. We can replace all electron configurations from OpenADAS affected by this bug with the correct ones from the NIST database. However, for that we will need to manually create dictionaries with the correct electronic configurations.

Feb 14 '21 17:02 vsnever

Yeah, I'm more in favour of something like this. I'm not convinced wavelengths would be good transition keys because they are not guaranteed to be unique at a given wavelength resolution. I believe specifying the transitions through spectroscopic notations is clearer. In the long term, I envisioned we would eventually move to specialised transition objects that can validate themselves and handle more advanced notations.

Also, the philosophy of the populate() method was to read a basic set of ADF files to help users get started. Users were expected to curate their own atomic data sets and ADF files for more advanced cases. If it's only a small number of ADF files that are failing, we could remove them from the populate() method and shift the burden to the user to correct and import them manually.

Feb 14 '21 20:02 mattngc

This is not really an issue with cherab, it is an issue with the source data. The solution really should be to throw an error if duplicate keys are found within the same file, the user can deal with the malformed data. If we are being particularly helpful, we could add a "patch" system that addresses known issues with the source data in a consistent way, e.g. relabelling data where there is a known clash and upstream won't address it. The adf reading routines can refer to the patch table when reading data.

Feb 16 '21 01:02 CnlPepper

Wavelengths as keys is a non-starter. There are too many avenues for such a system to fail: different decimal places, wavelength values being revised over time etc...

Feb 16 '21 01:02 CnlPepper

If we are being particularly helpful, we could add a "patch" system that addresses known issues with the source data in a consistent way, e.g. relabelling data where there is a known clash and upstream won't address it. The adf reading routines can refer to the patch table when reading data.

OpenADAS files are not updated very often, I agree that this would be the best solution.

Feb 16 '21 09:02 vsnever