foldcomp icon indicating copy to clipboard operation
foldcomp copied to clipboard

Error compresing `PDB`

Open valentynbez opened this issue 1 year ago • 2 comments

Hello,

I was trying to compress PDB and I constantly get the same error. I tried changing all extensions from .ent to .pdb and rewriting pdb's using ProDy, so that everything unnecessary is removed from the pdb itself.

Compressing files in correct_pdb using 32 threads
Output directory: pdb_foldcomp
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)

If I try per-file compression, it only writes a single file and quits. It would also be nice to see what file is being processed, in case it's an error with pdb contents.

Cheers, V

valentynbez avatar May 29 '23 19:05 valentynbez

Thanks for the feedback. I'll implement a verbosity option for logging error with processed file name. As initial foldcomp was designed to handle predicted structures without discontinuity, we haven't checked all the possible error cases from the real data. To check the cause of error, it would be helpful if you could share the preprocessing script to handle the PDB.

khb7840 avatar May 31 '23 02:05 khb7840

Thanks for the answer, I would be really grateful for help and I think having a foldcomp db of experimental structures gonna be awesome! I tried different possibilities, here is a snippet for my test data (https://www.rcsb.org/structure/7db5):

from prody import parsePDBStream, writePDB
from pathlib import Path
import re

file = "databases/pdb_structures/7db5.pdb"
outfolder = "."

file = Path(file)
filename = file.name
outfolder = Path(outfolder)
outfile = outfolder / filename

with open(str(file)) as f:
    pdb = parsePDBStream(f)

# get only first chain of the pdb file 
first_chain = [str(chain_id).split()[1] for chain_id in pdb.iterChains()][0]
with open(str(file)) as f:
    pdb = parsePDBStream(f, chain=first_chain)
writePDB(str(outfile), pdb)

# overwrite first line in the outfile
with open(str(outfile), "r") as f:
    lines = f.readlines()

# adding a TITLE, replacing a REMARK
lines[0] = "TITLE     " + filename.split(".")[0] + "\n"
with open(str(outfile), "w") as f:
    for i, line in enumerate(lines):
        f.writelines(line)

valentynbez avatar May 31 '23 08:05 valentynbez