TensorMol icon indicating copy to clipboard operation
TensorMol copied to clipboard

Basis adaptor and Data Repository

Open Dom1L opened this issue 8 years ago • 1 comments

What was the reason to choose the 6-311G** basis set in combination with the wB97X-D functional? I would have thought that a basis set augmented with diffuse functions would probably do a better job when dealing with electrostatics or charged molecules in general. Was it just because your initial trainingset didn't contain any charged molecules or was there any other reason for it?

Greetings, Dominik

Dom1L avatar Dec 18 '17 09:12 Dom1L

Actually another good reason to use aug is to smooth out BSSE. The reasons we did it this way for 0.1:

  • Our training data has some zwitterionic but no ionic species. This was intentional, because the local PES of a N+1 or N-1 atom should not be the same as the neutral ie: the Behler graph needs some electronic configuration input beyond the geometry to treat charged species properly (imo). Correct solvation of charged species with electronic state variables is a goal for 0.2.
  • Generation of training data is a relatively low priority for us, and we got "locked-in" after amassing a fair amount of data in this basis.

I wouldn't call using 6-311g** an "issue" but there are two issues I would raise to this effect:

  • Developing an adaptor which can allow training on mixed basis set data by adjusting for the atomization energies within each ab-initio method.
  • Developing a procedure to share and distribute training data. It's pretty tragic there are like 3-4 government funded places to store molecular data in the US and EU, and all of them are useless for the purposes of sharing 10gb datasets. They are only capable of showing you a drawing of aspirin and giving the homo lumo gap with 10 functionals (Yay funding agencies :P).

If you have any data you'd like to share, or need help training on your data, don't hesitate to reach out. It's non-trivial.

  • John

jparkhill avatar Dec 18 '17 15:12 jparkhill