pdb-tools icon indicating copy to clipboard operation
pdb-tools copied to clipboard

`pdb_tidy` removes the `TER` record between chains and removes last `ENDMDL` in a multi-model PDB

Open rvhonorato opened this issue 1 year ago • 8 comments

Describe the bug pdb_tidy removes the TER record between chains and removes last ENDMDL in a multi-model PDB.

To Reproduce

  1. test.pdb
MODEL        1
ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N  
TER       2      THR A   1
ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N  
TER       4      THR B   1
ENDMDL
MODEL        2
ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C  
TER       2      ARG A  10
ATOM      3   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C  
TER       4      TPO B 197
HETATM    5    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N  
TER       6      TPO B 197
ENDMDL
  1. pdb_tidy test.pdb > tidy.pdb
$ cat tidy.pdb
MODEL        1
ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N
ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N
TER       4      THR B   1
ENDMDL
MODEL        2
ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C
TER       2      ARG A  10
ATOM      4   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C
TER       5      ARG B  10
HETATM    7    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N
END
diff test.pdb tidy.pdb
1,14c1,12
< MODEL        1
< ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N
< TER       2      THR A   1
< ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N
< TER       4      THR B   1
< ENDMDL
< MODEL        2
< ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C
< TER       2      ARG A  10
< ATOM      3   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C
< TER       4      TPO B 197
< HETATM    5    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N
< TER       6      TPO B 197
< ENDMDL
---
> MODEL        1
> ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N
> ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N
> TER       4      THR B   1
> ENDMDL
> MODEL        2
> ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C
> TER       2      ARG A  10
> ATOM      4   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C
> TER       5      ARG B  10
> HETATM    7    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N
> END

Expected behavior

The TER records between the chains should be kept and the last ENDMDL kept

Desktop (please complete the following information):

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy
$ python --version
Python 3.11.2
$ pip show pdb-tools
Name: pdb-tools
Version: 2.5.0
Summary: A swiss army knife for PDB files.
Home-page: http://bonvinlab.org/pdb-tools
Author: Joao Rodrigues
Author-email: [email protected]
License: Apache Software License, version 2
Location: /home/rodrigo/.pyenv/versions/3.11.2/lib/python3.11/site-packages
Requires:
Required-by:

rvhonorato avatar Mar 27 '23 14:03 rvhonorato

note:

If we repeat the first line to simulate having two atoms before the first TER, the TER is not removed.

The same does not happen with the HETATM entry.

joaomcteixeira avatar Mar 27 '23 14:03 joaomcteixeira

Thanks for the report @rvhonorato, we'll have a look.

JoaoRodrigues avatar Mar 27 '23 15:03 JoaoRodrigues

This is probably an edge case since the test pdb is not realistic and it works for "real" structures - anyway could be an indicative of some underlying issue.

Let me know if there's anyway I can help

rvhonorato avatar Mar 27 '23 15:03 rvhonorato

I had a look at the format specification and it seems to hint that TER statements do not apply after HETATM. Only at the terminus of a (linked) chain. Checking a couple of random PDBs does reinforce that:

  • https://files.rcsb.org/view/1CTF.pdb
  • https://files.rcsb.org/view/1brs.pdb

JoaoRodrigues avatar Mar 27 '23 15:03 JoaoRodrigues

Its indeed not very clear, looking at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#TER

Every chain of ATOM/HETATM records presented on SEQRES records is terminated with a TER record.

and https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html

indicates the end of a chain of residues. For example, a hemoglobin molecule consists of four subunit chains that are not connected. TER indicates the end of a chain and prevents the display of a connection to the next chain.

And deeper into the SEQRES record: https://www.wwpdb.org/documentation/file-format-content/format33/sect3.html#SEQRES

SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.

So that seems to imply to me that there is some relation between TER and SEQRES. Since the pdbs might not have this SEQRES to pull the limits from, its probably ok follow the convention of always having TER between chains of ATOM and additionally a TER between chain breaks (non-continuous numbering in ATOM) using the strict options, which I think already exists, right?

rvhonorato avatar Mar 28 '23 08:03 rvhonorato

Yes - better too few than too many TER statements.

Adding TER statement at any chain break (even within a chain is a dangerous thing since it implies there is a real end of the chain there - meaning some software will interpret it as there should be a charged termini)

and additionally a TER between chain breaks (non-continuous numbering in ATOM) using the strict options, which I think already exists, right?

amjjbonvin avatar Mar 28 '23 08:03 amjjbonvin

Software that interprets the PDB format should cross-relate the TER records and the SEQRES to decide if its the true break or not - but its unlikely that this behaviour covers PDBs obtained from non-experimental methods, in that case (older) tools might just indeed assume its the OXT.

+1 for less TER in the sake of compability - but still the bug above is still relevant

rvhonorato avatar Mar 28 '23 08:03 rvhonorato

Any news on this? Is it still relevant or implemented already?

amjjbonvin avatar Mar 06 '24 07:03 amjjbonvin