plumed2 icon indicating copy to clipboard operation
plumed2 copied to clipboard

our XYZ parser read a box even if a box is not there

Open carlocamilloni opened this issue 2 years ago • 7 comments

@GiovanniBussi @gtribello @maxbonomi @Iximiel In many of our .xyz trajectories in the regtest we do not have a box in the second line but something like: "generated by VMD" 3 words that are then converted to numbers 👍 now I modified driver.cpp to use tools:convert but I think we should handle it better. cf. ea418fe3b4ae9dab05297449ebcc9483393225c6 in #990

carlocamilloni avatar Dec 01 '23 01:12 carlocamilloni

Could be better to set up a regex like ^(%f %f %f|%f %f %f %f %f %f %f %f %f) instead of counting the words? so that "generated by VMD" became a "no box" instead of an error?

And set up a make-style test that tries a complete set of combinations right and wrong to check that (to future-proof the thing)?

the "%f" in the example should be something like [0-9]+(\.[0-9]*|) but correct

Iximiel avatar Dec 01 '23 07:12 Iximiel

Setting up a regex is tricky for numbers in scientific notation (1e-2).

One option would be to check the value returned by sscanf. It it's different from 3 (or 9), we call plumed_error(). However, I think I slightly prefer @carlocamilloni 's solution because it allows to pass box elements such as 3.2*sin(pi/3) which might help describing non orthorombic boxes.

Finally, we might even decide to allow some special string. E.g., we could say generated by VMD means "no box", and check for these string before trying to interpret the numbers. Would this be better? But in Carlo's commit I see other strings (e.g. Atoms. Timestep: 0). I am not sure it makes sense to generalize to all these options.

We might even look for a standardized syntax (such as "extended XYZ", see e.g. here). If we do this I would keep the backward compatibility with 3 (or 9) numbers. I am not sure it's worth, it depends on how many software can handle this. VMD cannot apparently.

GiovanniBussi avatar Dec 01 '23 08:12 GiovanniBussi

At the moment I am fixing the few broken regtest and merging the pull request to reenable the intel CI. I think it would be better to rely on external libraries to read file (xyz, gro, etc), ideally shipping the code with plumed, and remove our own implementations. BTW we will need to implement an mmCIF reader soon

https://github.com/PDB-REDO/libcifpp https://mmcif.wwpdb.org/docs/sw-examples/cpp/html/index.html

carlocamilloni avatar Dec 01 '23 08:12 carlocamilloni

The problem with xyz is that, if there's such a library, it would break our long-running convention of using box vectors in the second line.

For other formats I totally agree. Implementing the gro reader was not trivial, I had to read the gromacs code to do it properly (I hope it's correct). It would be nice to have mmCIF I agree.

GiovanniBussi avatar Dec 01 '23 08:12 GiovanniBussi

if we find a stable library I would go for breaking our convention because this bug was tricky to detect, it must have been around for 8 years or so.

I have found this: http://chemfiles.org/chemfiles/latest/formats.html#list-of-supported-formats

carlocamilloni avatar Dec 01 '23 08:12 carlocamilloni

I think the bug was not in the convention, but in the implementation. If we assume that 3 numbers represent the box, the mistake is not checking that they are numbers...

GiovanniBussi avatar Dec 01 '23 10:12 GiovanniBussi

Yea I agree, anyway that chemfiles project may be a very good option

carlocamilloni avatar Dec 01 '23 16:12 carlocamilloni