RMG-Java icon indicating copy to clipboard operation
RMG-Java copied to clipboard

Equals sign in species labels breaks restart

Open jwallen opened this issue 14 years ago • 3 comments

When parsing a Chemkin-like reaction equation (e.g. in reading restart files, or appending "(+M)" to pressure-dependent reactions), we assume that the reactants and products do not contain equals signs, so that the only equals sign is the reaction arrow. This implies that we cannot allow equals signs to be used in species from condition files or seed mechanisms, which is unfortunate because it would disallow the use of SMILES. However, RMG does not currently check the species labels for forbidden characters like this when starting a job, leading to issues if trying to parse a restart file.

Possible solutions:

  • Forbid equals signs (and perhaps other such characters) in all species labels. This is probably not too difficult of a fix, but it removes quite a bit of flexibility (read: I like using SMILES).
  • Fix our restart file reading to handle equals signs in species labels. Since we have to specify all the species labels in another place, it should be possible to parse the species labels from the restart/Chemkin reaction equations. I don't think this would be too difficult either, but I'm not an expert in restart file reading/writing.
  • Change the reaction arrow to "<=>" instead of "=". This is probably not the best approach, since it means we can only use up to nine characters per species label (down from the current ten).

(This is also an issue in RMG-Py, where we have been merrily using SMILES in our generated species labels since its inception. However, the issue is limited to reading and writing of Chemkin files, as restart is provided by another mechanism that doesn't break when this issue occurs. For Py I will probably just fix the Chemkin reader to handle equals signs in species labels.)

jwallen avatar Nov 22 '11 18:11 jwallen

Bear in mind any restrictions on chemkin species:

CHEMKIN 3.7 manual's Rules for Species Data:

Species names are composed of up to 16-character upper- or lower- case symbols. The names cannot begin with the characters +, =, or a number; an ionic species name may end with one or more +'s or -'s.

For reaction data:

The reaction description can begin anywhere on the line. All blank spaces, except those between Arrhenius coefficients, are ignored. Each reaction description must have =, <=> or => between the last reactant and the first product.

Does that make this valid?

SPECIES
C C=C C+ E  
END
REACTIONS
C+C=C=C=C+C++E 1 1 1 ! (C is ionized to C+ and an electron by collision with C=C)
END

Now how would you parse it if I also had the species C= C+C C++ C=C=C C=C=C=C ?

My hunch says the restrictions must be tighter than the manual suggests.

What are the benefits and costs to adhering to chemkin rules for our restart files? I think if trying to make chemkin-compatible files, tight restrictions on species labels is maybe not a problem (they're length-limited anyway).

rwest avatar Nov 22 '11 20:11 rwest

In that case, I think we should throw an exception and stop if a species label in a condition file or a seed mechanism contains an invalid character. We could just auto-generate a new name when this happens, but I think that breaks the spirit of putting the seed mechanism and condition file species into the model "as-is".

Incidentally, I had already dealt with this some time ago in RMG-Py while writing the Chemkin file read/write functionality there. In that case I use the regex [^A-Za-z0-9\-_]+ to check for invalid characters. Thus, even ethylene becomes C2H4(8) instead of C=C(8) in the Chemkin output. Furthermore, the SMILES is still printed in the Chemkin file as a comment, so we get the best of both worlds. Another reason to switch to Py I suppose.

jwallen avatar Nov 22 '11 20:11 jwallen

Being able to import "real world" chemkin files with minimal translation would be nice. Non-alphanumeric characters I have come across in a brief survey just now:

Princeton (Dryer/Dooley) *-() JetSurF (Hai Wang) (),- LLNL (Westbrook etc.) -()

So it would be good to allow those. Nobody is using =.

rwest avatar Nov 22 '11 21:11 rwest