sigma icon indicating copy to clipboard operation
sigma copied to clipboard

Identification list for species:

Open DetlevCM opened this issue 7 years ago • 3 comments

Chemical names are nice for humans, but less practical for computers. It would be nice if the cavities were accompanied by a list of CAS numbers (asked for by many journals nowadays too), to uniquely identify each compound.

There is an online tool available - and I have been running the names through the tool. (This is the one I used: http://cts.fiehnlab.ucdavis.edu/ ) I did not (!) verify every match of name to CAS number, but I suspect between nothing or some errors, some errors are the lesser evil. In addition, I determined that using the cavities with the "regular sigma range" is not posible for ions. Radicals as well as some heavier atom also seem to cause issues.

Here is the 'result' of matching names to CAS numbers for the MOPAC cavities: POA1_working_compounds.txt POA1_ions_radicals_CAS_not_found.txt

DetlevCM avatar Jun 01 '18 06:06 DetlevCM

In the same effort, I just filtered the list for the GAMESS cavities:

list_HF.txt removed_HF.txt

Side note:

I have also identified an issue with a CAS number. The tool gave me water as '13670-17-2' which is heavy water... - Normal water is '7732-18-5'

There is also at least one duplicate in the GAMESS database: tetrachloroethylene and tetrachloroethene.

DetlevCM avatar Jun 01 '18 07:06 DetlevCM

Thanks for the lists. We could try to improve on this on the future. Some points to keep in mind:

  • How to handle cations, anions and other possible intermediate radicals without CAS
  • How to handle multiple conformers of the same molecule (currently we are providing only one conformer)

Regarding the usual range of -0.025 to 0.025, this can happen. We are providing here the 'raw' apparent surface charges, after 'averaging' the surface charges this is less likely to happen.

rpseng avatar Jun 01 '18 11:06 rpseng

Well, there is an extension to COSMO-SAC for electrolytes: https://pubs.acs.org/doi/abs/10.1021/ie100689g Though that isn't of interest to me at current - so I cannot comment on it any further. (And one more paper: https://www.sciencedirect.com/science/article/pii/S0378381218300347 )

In the case of the sigma profiles, where the range was exceeded, this was for the averaged charge density. Expanding the range takes care of the problem - or changing the parameterisation also would. But as mentioned above, I am at present not interested in ions - so it is easier for me to just remove them. (And place them in a dedicated list.) For all stable stepcies, the range of the averaged sigma profile does not exceed the parameterisation range.

As to how to properly handle ions and conformers: I don't know. I do know that other people are also interested in conformers. Whether I will work with them in the future, I do not know. I would possibly suggest using a slightly more complex naming pattern: either using a "fake CAS number" to append information, say "-c00001", "-c00002" etc. and leave the treatment to code, or an additional column that provides an integer counter for the conformer number with identical CAS numbers. - I'm sure that different people will have different favoured approaches.

DetlevCM avatar Jun 01 '18 13:06 DetlevCM