atomium
atomium copied to clipboard
PDB format output with numbers as chain ID
Hi. I am using atomium to extract molecules from mmCIF files and write them into PDB format. Generally, this works really well, but I encountered an issue with structures where the chain ID is a number instead of a letter.
Expected behaviour
The chain ID should not be written as part of the residue number, but only in the column reserved for the chain ID.
Actual behaviour
When the chain ID is a number, it is written into the PDB string twice (once as chain ID and once as part of the residue number). The resulting files are too broad for the PDB specification and are parsed badly by many other programs.
Example code to reproduce
import atomium
cif = atomium.fetch("6L4T")
lig = [l for l in cif.model.ligands() if l.id == "13.308"][0]
print(atomium.pdb.structure_to_pdb_string(lig))
Output (truncated):
HETATM20582 NB KC1 1313308 208.930 314.544 325.109 1.00 90.18 N
HETATM20583 ND KC1 1313308 205.979 312.067 326.352 1.00 90.18 N
HETATM20584 C1A KC1 1313308 208.131 312.489 328.676 1.00 90.18 C
HETATM20585 C1B KC1 1313308 209.880 315.122 325.835 1.00 90.18 C
HETATM20586 C1C KC1 1313308 206.761 314.055 322.987 1.00 90.18 C
HETATM20587 C1D KC1 1313308 204.767 311.511 325.824 1.00 90.18 C
Note that the chain ID ("13") is written twice.
Python Version/Operating System
I am using atomium 1.0.11 (from conda-forge) on Python 3.10 / Linux
Thanks in advance for your support, and thanks for publishing atomium as open-source :-)
Thanks for flagging this - atomium 2.0.0 is nearing completion, so I will fix this issue for that release (likely next month) if it isn't already fixed there. I've overhauled the way saving is done generally.
Ok. Thanks for the info, I'm looking forward to the new version. Until then, I can work around it.