foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

Mismatch 3Di Sequence Inferenced from FoldSeek Command and that from mini3di repo

Open KatarinaYuan opened this issue 5 months ago • 2 comments

Expected Behavior

I try to transform PDB structures into 3Di sequences. For mini3di (https://github.com/althonos/mini3di/), I used

pdb_path = "1xso.cif"
# mini3di
from Bio import PDB
if pdb_path.endswith(".pdb"):
    parser = PDB.PDBParser(QUIET=True)
else:
    parser = PDB.MMCIFParser(QUIET=True)
structure = parser.get_structure("test", pdb_path)
states = self.tokenizer_encoder.encode_chain(structure[0][chain_id])
seq_mini3di = self.tokenizer_encoder.build_sequence(states)

For FoldSeek, I used the command suggested by this issue #314

Current Behavior

mini3di results in "DKKKWWKDFPDPKTKIKIWDDDDLFKIKIWMKIFQADFDKKWKWWACAQDCPVTVVVSHFGAAPPDFWDFAQPDPRHGLTGDFIFGDDPRMTTDMDIHNSAGCDDPNRQQRIKMFIANAGQCGLPPPDPVSRGTSPRDDTRIMTGMHGDD"

and FoldSeek results in "DKKKWWKDFPDPKTKIKIWDDDDLFKIKIWMKIFQADFDKKWKWWACAQDCPVHVVVSHFGAAPPDFWDFAQPDPRHGLTGDFIFGDDPRMTTDMDIHNSAGCDDPNRQQRIKMFIANAGQCGLPPPDPVSRGTSPRDDTRIMTGMHDDD"

and the two resulted sequences are not identical in some residues.

Environment

I used foldseek==9-427df8a (the latest) and mini3di==0.1.1.

Thanks for help

KatarinaYuan avatar Aug 27 '24 21:08 KatarinaYuan