ProDy icon indicating copy to clipboard operation
ProDy copied to clipboard

Bug in PQR parser

Open VonBoss opened this issue 11 months ago • 12 comments

Summary

The Prody PQR parser fails to parse PQR files that do not have white space between their data fields. I have encountered this issue when the XYZ coordinates have 3 digits in front of the decimal place and are negative.

Example: lines from a file that failed to parse

ATOM      1    C STP     1     -26.417  62.269-123.283    0.00     3.30
ATOM      2    O STP     1     -30.495  65.669-122.669    0.00     3.08
ATOM      3    O STP     1     -25.792  61.516-124.085    0.00     3.32
ATOM      4    O STP     1     -25.262  61.506-124.230    0.00     3.05
ATOM      5    O STP     1     -30.521  65.070-122.439    0.00     3.05

There is no whitespace between the Y and Z coordinates in this file, so Prody fails to parse it.

Fix

Currently, Prody is parsing PQR files differently than PDB files by splitting each line on whitespaces. (prody/proteins/pdbfile.py)

  if not isPDB:
      fields = line.split()
  ......
  coordinates[acount, 0] = fields[6]
  coordinates[acount, 1] = fields[7]
  coordinates[acount, 2] = fields[8]

To avoid parsing issues Prody should parse PQR files by position/index which conforms to the file specification for a PQR file and matches other PQR parsers such as Biopython.

x = float(line[30:38])
y = float(line[38:46])
z = float(line[46:54])

Caveat

It appears that different softwares write the charge and radius information in different positions. (This may be the reason why Prody chose to parse on whitespace from the beginning.)

12345678901234567890123456789012345678901234567890123456789012345678901234
                Resn  Resi           X       Y       Z       Q       R

From https://pipe.rcc.fsu.edu/transcomp/PQRformat.htm
ATOM      1  N   VAL     1      16.783  48.812  26.447  0.0577   1.550
ATOM      2  H1  VAL     1      15.848  48.422  26.463  0.2272   1.200
ATOM      3  H2  VAL     1      16.734  49.803  26.251  0.2272   1.200
ATOM      4  H3  VAL     1      17.195  48.663  27.359  0.2272   1.200
ATOM      5  CA  VAL     1      17.591  48.101  25.416 -0.0054   1.700

From fpocket 4.0
ATOM      1    C STP     1     -26.417  62.269-123.283    0.00     3.30
ATOM      2    O STP     1     -30.495  65.669-122.669    0.00     3.08
ATOM      3    O STP     1     -25.792  61.516-124.085    0.00     3.32
ATOM      4    O STP     1     -25.262  61.506-124.230    0.00     3.05
ATOM      5    O STP     1     -30.521  65.070-122.439    0.00     3.05

From PyMol 2.5
ATOM     29  P     G    11     -17.189  -6.642 -23.827  0.00000000   0.000
ATOM     30  C5'   G    11     -15.744  -5.986 -25.911  0.00000000   0.000
ATOM     31  O5'   G    11     -16.783  -5.642 -25.005  0.00000000   0.000
ATOM     32  C4'   G    11     -15.722  -5.012 -27.074  0.00000000   0.000
ATOM     33  O4'   G    11     -16.947  -5.133 -27.838  0.00000000   0.000

VonBoss avatar Mar 29 '24 16:03 VonBoss