pdb-tools
pdb-tools copied to clipboard
`pdb_tidy` removes the `TER` record between chains and removes last `ENDMDL` in a multi-model PDB
Describe the bug
pdb_tidy
removes the TER
record between chains and removes last ENDMDL
in a multi-model PDB.
To Reproduce
- test.pdb
MODEL 1
ATOM 1 N THR A 1 17.047 14.099 3.625 1.00 13.79 N
TER 2 THR A 1
ATOM 3 N THR B 1 11.047 11.099 11.625 0.00 0.00 N
TER 4 THR B 1
ENDMDL
MODEL 2
ATOM 1 CA ARG A 10 8.496 4.609 8.837 1.00 3.38 C
TER 2 ARG A 10
ATOM 3 CA ARG B 10 22.496 22.609 22.837 1.00 3.38 C
TER 4 TPO B 197
HETATM 5 N TPO B 197 21.891 2.133 -14.748 1.00 38.81 N
TER 6 TPO B 197
ENDMDL
-
pdb_tidy test.pdb > tidy.pdb
$ cat tidy.pdb
MODEL 1
ATOM 1 N THR A 1 17.047 14.099 3.625 1.00 13.79 N
ATOM 3 N THR B 1 11.047 11.099 11.625 0.00 0.00 N
TER 4 THR B 1
ENDMDL
MODEL 2
ATOM 1 CA ARG A 10 8.496 4.609 8.837 1.00 3.38 C
TER 2 ARG A 10
ATOM 4 CA ARG B 10 22.496 22.609 22.837 1.00 3.38 C
TER 5 ARG B 10
HETATM 7 N TPO B 197 21.891 2.133 -14.748 1.00 38.81 N
END
diff test.pdb tidy.pdb
1,14c1,12
< MODEL 1
< ATOM 1 N THR A 1 17.047 14.099 3.625 1.00 13.79 N
< TER 2 THR A 1
< ATOM 3 N THR B 1 11.047 11.099 11.625 0.00 0.00 N
< TER 4 THR B 1
< ENDMDL
< MODEL 2
< ATOM 1 CA ARG A 10 8.496 4.609 8.837 1.00 3.38 C
< TER 2 ARG A 10
< ATOM 3 CA ARG B 10 22.496 22.609 22.837 1.00 3.38 C
< TER 4 TPO B 197
< HETATM 5 N TPO B 197 21.891 2.133 -14.748 1.00 38.81 N
< TER 6 TPO B 197
< ENDMDL
---
> MODEL 1
> ATOM 1 N THR A 1 17.047 14.099 3.625 1.00 13.79 N
> ATOM 3 N THR B 1 11.047 11.099 11.625 0.00 0.00 N
> TER 4 THR B 1
> ENDMDL
> MODEL 2
> ATOM 1 CA ARG A 10 8.496 4.609 8.837 1.00 3.38 C
> TER 2 ARG A 10
> ATOM 4 CA ARG B 10 22.496 22.609 22.837 1.00 3.38 C
> TER 5 ARG B 10
> HETATM 7 N TPO B 197 21.891 2.133 -14.748 1.00 38.81 N
> END
Expected behavior
The TER
records between the chains should be kept and the last ENDMDL
kept
Desktop (please complete the following information):
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
$ python --version
Python 3.11.2
$ pip show pdb-tools
Name: pdb-tools
Version: 2.5.0
Summary: A swiss army knife for PDB files.
Home-page: http://bonvinlab.org/pdb-tools
Author: Joao Rodrigues
Author-email: [email protected]
License: Apache Software License, version 2
Location: /home/rodrigo/.pyenv/versions/3.11.2/lib/python3.11/site-packages
Requires:
Required-by:
note:
If we repeat the first line to simulate having two atoms before the first TER
, the TER
is not removed.
The same does not happen with the HETATM
entry.
Thanks for the report @rvhonorato, we'll have a look.
This is probably an edge case since the test pdb is not realistic and it works for "real" structures - anyway could be an indicative of some underlying issue.
Let me know if there's anyway I can help
I had a look at the format specification and it seems to hint that TER
statements do not apply after HETATM. Only at the terminus of a (linked) chain. Checking a couple of random PDBs does reinforce that:
- https://files.rcsb.org/view/1CTF.pdb
- https://files.rcsb.org/view/1brs.pdb
Its indeed not very clear, looking at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#TER
Every chain of ATOM/HETATM records presented on SEQRES records is terminated with a TER record.
and https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html
indicates the end of a chain of residues. For example, a hemoglobin molecule consists of four subunit chains that are not connected. TER indicates the end of a chain and prevents the display of a connection to the next chain.
And deeper into the SEQRES
record: https://www.wwpdb.org/documentation/file-format-content/format33/sect3.html#SEQRES
SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.
So that seems to imply to me that there is some relation between TER
and SEQRES
. Since the pdbs might not have this SEQRES
to pull the limits from, its probably ok follow the convention of always having TER
between chains of ATOM
and additionally a TER
between chain breaks (non-continuous numbering in ATOM
) using the strict options, which I think already exists, right?
Yes - better too few than too many TER statements.
Adding TER statement at any chain break (even within a chain is a dangerous thing since it implies there is a real end of the chain there - meaning some software will interpret it as there should be a charged termini)
and additionally a TER between chain breaks (non-continuous numbering in ATOM) using the strict options, which I think already exists, right?
Software that interprets the PDB format should cross-relate the TER
records and the SEQRES
to decide if its the true break or not - but its unlikely that this behaviour covers PDBs obtained from non-experimental methods, in that case (older) tools might just indeed assume its the OXT
.
+1 for less TER
in the sake of compability - but still the bug above is still relevant
Any news on this? Is it still relevant or implemented already?