rmsd pdb coordinate reader: error: Parsing coordinates for the following line

pdb coordinate reader: error: Parsing coordinates for the following line

Open tccyl opened this issue 5 years ago • 8 comments

if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line)) If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.

Jun 05 '19 15:06 tccyl

Probably the issue is the missing space between y- and z-component of the coordinate. Instead of a manual correction, such an omission may be corrected on the CLI with openbabel (openbabel.org) in a pattern of

babel -ipdb notworking.pdb -opdb now_working.pdb

where -ipdb defines the input format as .pdb, and similar, -opdb specifies the output format as .pdb.

Depending on the version of calculate_rmsd.py, the *.pdb generated by openbabel might not work well. In this case, in case you do not need crystallographic information like space group symmetry, you may better work with the least complex file type instead, .xyz. If so a call from the terminal in pattern of

obabel *.pdb -oxyz -m

will convert in a batch all *.pdb in your directory into .xyz files.

Give this a try, and if not working post again.

Norwid

On Wed, 05 Jun 2019 08:56:32 -0700 tccyl [email protected] wrote:

if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line)) If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.

Jun 05 '19 16:06 nbehrnd

Hi @tccyl ,

Where is the PDB file from? From rcsb.org?

I am not a heavy .pdb fileformat user. I've read the http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM fileformat documentation and it seems PDB is column width based and not space.split as currently implemented.

You are very welcome to make a pull request solving this formatting, including a .pdb file where rmsd fails.

Jun 07 '19 07:06 charnley

Dear Jimmy,

calculate_rmsd.py is sometimes deployed by mine for data derived from single crystal diffraction; natively deposit as .cif, a reading format equally recognized by openbabel. Here, I'm able to second @tccyl as well as the documentation in and around the script that not all .pdb are equally well suited to enter successfully the Kabsch test and tentatively attribute different formatting as well as their content contributing to some of the issues.

Converting .cif to .pdb with openbabel yields files generally unsuitable for calculate_rmsd.py. Which is why I typically either

convert them further to .xyz with openbabel, then passing successfully; or
deploy Olex2 to write either .pdb, or .xyz. Both types interact well with calculate_rmsd.py. Because it was not perceived as an obstacle, I didn't spend additional time on this issue.

Possibly some of the documentation attached may illustrate the experience.

2019-Jun-07_calculate_rmsd_pdb_corrected.zip

Jun 07 '19 12:06 nbehrnd

Probably the issue is the missing space between y- and z-component of the coordinate. Instead of a manual correction, such an omission may be corrected on the CLI with openbabel (openbabel.org) in a pattern of babel -ipdb notworking.pdb -opdb now_working.pdb where -ipdb defines the input format as .pdb, and similar, -opdb specifies the output format as .pdb. Depending on the version of calculate_rmsd.py, the .pdb generated by openbabel might not work well. In this case, in case you do not need crystallographic information like space group symmetry, you may better work with the least complex file type instead, .xyz. If so a call from the terminal in pattern of obabel .pdb -oxyz -m will convert in a batch all .pdb in your directory into .xyz files. Give this a try, and if not working post again. Norwid … On Wed, 05 Jun 2019 08:56:32 -0700 tccyl @.> wrote: if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line)) If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.

Hi, nbehrnd, Thanks for your nice suggestion. However, using openbabel to convert the pdb format is still not able to solve this issue. Because as @charnley said, pdb format is column width based, not space.split as currently implemented. But neither pdb format from rcsb nor that from openbabel, the column of x, y, z coordinates are the same and they follow the format: try: x = line[30:38] y = line[38:46] z = line[46:54] V.append(np.asarray([x, y ,z], dtype=float)) May be the way to obtain the x, y, z coordinates can directly use the above codes and not by looking for x_column.

Jun 14 '19 09:06 tccyl

Hi @tccyl ,

Where is the PDB file from? From rcsb.org?

I am not a heavy .pdb fileformat user. I've read the http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM fileformat documentation and it seems PDB is column width based and not space.split as currently implemented.

You are very welcome to make a pull request solving this formatting, including a .pdb file where rmsd fails. Yes, it is from rcsb.org.

Jun 14 '19 09:06 tccyl

Hi @tccyl ,

it seems my reply by email earlier didn't pass through. Anyway, meanwhile, there was some work on the script, aiming to enable .pdb written by the popular openbabel to pass the Kabsch test because the current version 1.3.2 (released in January 2019) does not work successfully with .pdb by openbabel.

For a small test molecules (benzamide) consisting of C, H, N, and O, the addition of some keywords to the instructions in the script now allows to work with such files successfully. It is deposit here and equally deposit as pull request #58 -- including additional test data (.pdb newly written by openbabel) known to work, too. Still labeled as version 1.3.2 (Jan 2019), awaiting an action by Jimmy.

Meanwhile, give it a try; perhaps your (test) data reveal additional keywords should be added, too. Be welcome to deposit your two files in question here -- perhaps there are additional keywords to consider which should be included. You need to know that there are multiple 'dialects' of .pdb files around, which contributes to the issues here (which is why .xyz represent a resort, at some expense, of course).

Jun 14 '19 14:06 nbehrnd

@nbehrnd Thank you so much~ The two example files where rmsd failed are below: two_fragment_files.zip

Jun 17 '19 03:06 tccyl

Hi @tccyl

in short, after passing the .pdb to openbabel, the RMSD calculate_rmsd.py determines for either variant of the Kabsch test equals to about 0.7983. Below both the script's copy used, as well as documenting (two .zip).

The detailed story: An initial inspection of the files in an editor revealed that both describe the same number of atoms per atom type. The subsequent check in avogadro revealed that the mutual distance of these atoms are beyond the van der Waals radii and in this sight not adjacent to each other. In their original form, the two files are not suitable for a Kabsch test with either the current version of calculate_rmsd.py (1.3.2 by January 2019), nor my changes from last week.

I passed your .pdb to openbabel (version 2.4.1 by November 2018) to be rewritten:

babel -ipdb 4L81_10_CPCN.pdb -opdb 4L81_10_CPCN_babel.pdb
babel -ipdb 4L81_11_CPCN.pdb -opdb 4L81_11_CPCN_babel.pdb

In both instances, openbabel indicated difficulties working with the orginal data. This suggests the export from the original source file should be revised; which obviously is not the topic of this thread. One of the error logs is included as error.log.

However atom label, (x,y,z) and atom type seem to pass into the newly written .pdb, which indeed includes retention of missing a space between the y- and z-component of the coordinates. Maybe characteristic for working with protein data, instead of small molecule data.

diffView

The openbabel-written .pdb then passed smoothly either of the three variants of the Kabsch test with engaged --reorder option (default / classical Kabsch test, --use-reflections, --use-reflections-keep-stereo) with the same numerical RMSD of about 0.7983. As a comparison, the .pdb were converted with babel into .xyz; again, the Kabsch tests state a RMSD of about 0.7983.

With the .xyz in hand, it may be interesting to inspect the 'best alignment' of the the two selected sets of atoms. Using 4L81_10_CPCN_babel.xyz as fixed model_A, and 4L81_11_CPCN_babel.xyz as model_B to be aligned in respect to model_A, the new coordinates of model_B were harvested by

python3 calculate_rmsd.py --reorder -p 4L81_10_CPCN_babel.xyz 4L81_11_CPCN_babel.xyz > new_alignment_11.xyz

Both model_A as well as the update of model_B (new_alignment_11.xyz) were read by jmol. Their corresponding selection of atoms were connected manually ('connect strut' instruction after selection of the atoms in question) with struts dyed either red (model_A) or blue (new_alignment_11 / updated model_B), labelled (model_A red, model_B blue) in an otherwise cpk-color scheme. They were exported as static .png and interactive .wrl (e.g., view3dscene) to walk around the superposition.

alignment

The labeling in jmol's display of the superposition is worth a word: Except the two opposite termini, the atoms were labeled in a pattern of C1/1.1 #1, where C1 stands for the first carbon atom in (1.1) the first model of the first file read. By same way, 2.1 is about the first model in the second file read by jmol. #1 refers then to the first atom in this model read, a counting independent of the atom type or atom label met in the file read.

rmsd-babel_issue.zip reporting.zip

Jun 17 '19 10:06 nbehrnd

rmsd rmsd copied to clipboard

pdb coordinate reader: error: Parsing coordinates for the following line

rmsd
rmsd copied to clipboard