foldseek
foldseek copied to clipboard
Mismatch 3Di Sequence Inferenced from FoldSeek Command and that from mini3di repo
Expected Behavior
I try to transform PDB structures into 3Di sequences. For mini3di (https://github.com/althonos/mini3di/), I used
pdb_path = "1xso.cif"
# mini3di
from Bio import PDB
if pdb_path.endswith(".pdb"):
parser = PDB.PDBParser(QUIET=True)
else:
parser = PDB.MMCIFParser(QUIET=True)
structure = parser.get_structure("test", pdb_path)
states = self.tokenizer_encoder.encode_chain(structure[0][chain_id])
seq_mini3di = self.tokenizer_encoder.build_sequence(states)
For FoldSeek, I used the command suggested by this issue #314
Current Behavior
mini3di results in "DKKKWWKDFPDPKTKIKIWDDDDLFKIKIWMKIFQADFDKKWKWWACAQDCPVTVVVSHFGAAPPDFWDFAQPDPRHGLTGDFIFGDDPRMTTDMDIHNSAGCDDPNRQQRIKMFIANAGQCGLPPPDPVSRGTSPRDDTRIMTGMHGDD"
and FoldSeek results in "DKKKWWKDFPDPKTKIKIWDDDDLFKIKIWMKIFQADFDKKWKWWACAQDCPVHVVVSHFGAAPPDFWDFAQPDPRHGLTGDFIFGDDPRMTTDMDIHNSAGCDDPNRQQRIKMFIANAGQCGLPPPDPVSRGTSPRDDTRIMTGMHDDD"
and the two resulted sequences are not identical in some residues.
Environment
I used foldseek==9-427df8a (the latest) and mini3di==0.1.1.
Thanks for help