rmsd
rmsd copied to clipboard
pdb coordinate reader: error: Parsing coordinates for the following line
if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line))
If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.
Probably the issue is the missing space between y- and z-component of the coordinate. Instead of a manual correction, such an omission may be corrected on the CLI with openbabel (openbabel.org) in a pattern of
babel -ipdb notworking.pdb -opdb now_working.pdb
where -ipdb defines the input format as .pdb, and similar, -opdb specifies the output format as .pdb.
Depending on the version of calculate_rmsd.py, the *.pdb generated by openbabel might not work well. In this case, in case you do not need crystallographic information like space group symmetry, you may better work with the least complex file type instead, .xyz. If so a call from the terminal in pattern of
obabel *.pdb -oxyz -m
will convert in a batch all *.pdb in your directory into .xyz files.
Give this a try, and if not working post again.
Norwid
On Wed, 05 Jun 2019 08:56:32 -0700 tccyl [email protected] wrote:
if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line))
If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.
Hi @tccyl ,
Where is the PDB file from? From rcsb.org?
I am not a heavy .pdb fileformat user. I've read the http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM fileformat documentation and it seems PDB is column width based and not space.split as currently implemented.
You are very welcome to make a pull request solving this formatting, including a .pdb file where rmsd fails.
Dear Jimmy,
calculate_rmsd.py is sometimes deployed by mine for data derived from single crystal diffraction; natively deposit as .cif, a reading format equally recognized by openbabel. Here, I'm able to second @tccyl as well as the documentation in and around the script that not all .pdb are equally well suited to enter successfully the Kabsch test and tentatively attribute different formatting as well as their content contributing to some of the issues.
Converting .cif to .pdb with openbabel yields files generally unsuitable for calculate_rmsd.py. Which is why I typically either
-
convert them further to .xyz with openbabel, then passing successfully; or
-
deploy Olex2 to write either .pdb, or .xyz. Both types interact well with calculate_rmsd.py. Because it was not perceived as an obstacle, I didn't spend additional time on this issue.
Possibly some of the documentation attached may illustrate the experience.
Probably the issue is the missing space between y- and z-component of the coordinate. Instead of a manual correction, such an omission may be corrected on the CLI with openbabel (openbabel.org) in a pattern of babel -ipdb notworking.pdb -opdb now_working.pdb where -ipdb defines the input format as .pdb, and similar, -opdb specifies the output format as .pdb. Depending on the version of calculate_rmsd.py, the .pdb generated by openbabel might not work well. In this case, in case you do not need crystallographic information like space group symmetry, you may better work with the least complex file type instead, .xyz. If so a call from the terminal in pattern of obabel .pdb -oxyz -m will convert in a batch all .pdb in your directory into .xyz files. Give this a try, and if not working post again. Norwid … On Wed, 05 Jun 2019 08:56:32 -0700 tccyl @.> wrote:
if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line))
If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.
Hi, nbehrnd, Thanks for your nice suggestion. However, using openbabel to convert the pdb format is still not able to solve this issue. Because as @charnley said, pdb format is column width based, not space.split as currently implemented. But neither pdb format from rcsb nor that from openbabel, the column of x, y, z coordinates are the same and they follow the format: try: x = line[30:38] y = line[38:46] z = line[46:54] V.append(np.asarray([x, y ,z], dtype=float)) May be the way to obtain the x, y, z coordinates can directly use the above codes and not by looking for x_column.
Hi @tccyl ,
Where is the PDB file from? From rcsb.org?
I am not a heavy .pdb fileformat user. I've read the http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM fileformat documentation and it seems PDB is column width based and not space.split as currently implemented.
You are very welcome to make a pull request solving this formatting, including a .pdb file where rmsd fails. Yes, it is from rcsb.org.
Hi @tccyl ,
it seems my reply by email earlier didn't pass through. Anyway, meanwhile, there was
some work on the script, aiming to enable .pdb
written by the popular openbabel
to
pass the Kabsch test because the current version 1.3.2 (released in January 2019) does
not work successfully with .pdb
by openbabel
.
For a small test molecules (benzamide) consisting of C, H, N, and O, the addition of some keywords to the instructions in the script now allows to work with such files successfully. It is deposit here and equally deposit as pull request #58 -- including additional test data (.pdb newly written by openbabel) known to work, too. Still labeled as version 1.3.2 (Jan 2019), awaiting an action by Jimmy.
Meanwhile, give it a try; perhaps your (test) data reveal additional keywords should be
added, too. Be welcome to deposit your two files in question here -- perhaps there are
additional keywords to consider which should be included. You need to know that there are
multiple 'dialects' of .pdb
files around, which contributes to the issues here (which is why
.xyz
represent a resort, at some expense, of course).
@nbehrnd Thank you so much~ The two example files where rmsd failed are below: two_fragment_files.zip
Hi @tccyl
in short, after passing the .pdb
to openbabel, the RMSD calculate_rmsd.py
determines
for either variant of the Kabsch test equals to about 0.7983. Below both the script's copy
used, as well as documenting (two .zip).
The detailed story:
An initial inspection of the files in an editor revealed that both describe the same number
of atoms per atom type. The subsequent check in avogadro revealed that the mutual distance
of these atoms are beyond the van der Waals radii and in this sight not adjacent to each other.
In their original form, the two files are not suitable for a Kabsch test with either the current
version of calculate_rmsd.py
(1.3.2 by January 2019), nor my changes from last week.
I passed your .pdb
to openbabel
(version 2.4.1 by November 2018) to be rewritten:
babel -ipdb 4L81_10_CPCN.pdb -opdb 4L81_10_CPCN_babel.pdb
babel -ipdb 4L81_11_CPCN.pdb -opdb 4L81_11_CPCN_babel.pdb
In both instances, openbabel
indicated difficulties working with the orginal data. This suggests
the export from the original source file should be revised; which obviously is not the topic of this
thread. One of the error logs is included as error.log
.
However atom label, (x,y,z) and atom type seem to pass into the newly written .pdb
, which
indeed includes retention of missing a space between the y- and z-component of the
coordinates. Maybe characteristic for working with protein data, instead of small molecule data.
The openbabel-written .pdb
then passed smoothly either of the three variants of the Kabsch
test with engaged --reorder
option (default / classical Kabsch test, --use-reflections
,
--use-reflections-keep-stereo
) with the same numerical RMSD of about 0.7983. As a
comparison, the .pdb
were converted with babel into .xyz
; again, the Kabsch tests state a
RMSD of about 0.7983.
With the .xyz
in hand, it may be interesting to inspect the 'best alignment' of the the two
selected sets of atoms. Using 4L81_10_CPCN_babel.xyz as fixed model_A, and 4L81_11_CPCN_babel.xyz as model_B to be aligned in respect to model_A, the new
coordinates of model_B were harvested by
python3 calculate_rmsd.py --reorder -p 4L81_10_CPCN_babel.xyz 4L81_11_CPCN_babel.xyz > new_alignment_11.xyz
Both model_A as well as the update of model_B (new_alignment_11.xyz) were read by jmol. Their corresponding selection of atoms were connected manually ('connect strut' instruction after selection
of the atoms in question) with struts dyed either red (model_A) or blue (new_alignment_11 / updated model_B), labelled (model_A red, model_B blue) in an otherwise cpk-color scheme. They were
exported as static .png
and interactive .wrl
(e.g., view3dscene) to walk around the superposition.
The labeling in jmol
's display of the superposition is worth a word:
Except the two opposite termini, the atoms were labeled in a pattern of C1/1.1 #1
, where C1
stands for the first carbon atom in (1.1
) the first model of the first file read. By same way, 2.1
is about the first model in the second file read by jmol
. #1
refers then to the first atom in this
model read, a counting independent of the atom type or atom label met in the file read.