esm
esm copied to clipboard
Residue annotation always returns zero tensor
Hi, I'm trying to run through the residue annotation pipeline. I noticed that in encode_decode.py we have:
ra_tokens = residue_annotations_tokenizer.tokenize(
{
"interpro_site_descriptions": descriptions,
"interpro_site_starts": starts,
"interpro_site_ends": ends,
},
sequence=sequence,
fail_on_mismatch=True,
)
, but when I go into the residue_annotations_tokenizer.tokenize() function, I found that it always returns full pad tokens if the input is missing the field interpro_site_residues:
if any(
sample.get(field) is None
for field in [
"interpro_site_descriptions",
"interpro_site_starts",
"interpro_site_ends",
"interpro_site_residues",
]
):
return ["<pad>"] * seqlen
, which is exactly the case from encode_decode.py. This causes all residue annotations to be zeros. May I ask is this on purpose or am I missing something?