foldcomp
foldcomp copied to clipboard
Error compresing `PDB`
Hello,
I was trying to compress PDB and I constantly get the same error.
I tried changing all extensions from .ent
to .pdb
and rewriting pdb's using ProDy
, so that everything unnecessary is removed from the pdb itself.
Compressing files in correct_pdb using 32 threads
Output directory: pdb_foldcomp
terminate called after throwing an instance of 'std::out_of_range'
what(): map::at
Aborted (core dumped)
If I try per-file compression, it only writes a single file and quits. It would also be nice to see what file is being processed, in case it's an error with pdb contents.
Cheers, V
Thanks for the feedback. I'll implement a verbosity option for logging error with processed file name. As initial foldcomp was designed to handle predicted structures without discontinuity, we haven't checked all the possible error cases from the real data. To check the cause of error, it would be helpful if you could share the preprocessing script to handle the PDB.
Thanks for the answer, I would be really grateful for help and I think having a foldcomp db of experimental structures gonna be awesome! I tried different possibilities, here is a snippet for my test data (https://www.rcsb.org/structure/7db5):
from prody import parsePDBStream, writePDB
from pathlib import Path
import re
file = "databases/pdb_structures/7db5.pdb"
outfolder = "."
file = Path(file)
filename = file.name
outfolder = Path(outfolder)
outfile = outfolder / filename
with open(str(file)) as f:
pdb = parsePDBStream(f)
# get only first chain of the pdb file
first_chain = [str(chain_id).split()[1] for chain_id in pdb.iterChains()][0]
with open(str(file)) as f:
pdb = parsePDBStream(f, chain=first_chain)
writePDB(str(outfile), pdb)
# overwrite first line in the outfile
with open(str(outfile), "r") as f:
lines = f.readlines()
# adding a TITLE, replacing a REMARK
lines[0] = "TITLE " + filename.split(".")[0] + "\n"
with open(str(outfile), "w") as f:
for i, line in enumerate(lines):
f.writelines(line)