esm
esm copied to clipboard
several problems with esmc
Hello everybody! today i'm trying to test ESM-C but i'm having an hard time due to different problems:
- tokenizer initialization need ESM3 agreement. solved by doing login on huggingface hub
- encoding sequences fail due to the absence of mask_token parameter in ESMC.tokenizer:
sequence = sequence.replace(C.MASK_STR_SHORT, sequence_tokenizer.mask_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: replace() argument 2 must be str, not None
solved by manually calling tokenizer:
seq = 'AAAAAAAAAA'
res = client.tokenizer(seq,add_special_tokens=True)
ids = torch.tensor(res['input_ids'],dtype=torch.int64).to('cuda')
- Then i called forward method of EMC class passing ids tensor but i got some mismatching dimension error inside rotary embedding:
esm/layers/rotary.py:54, in apply_rotary_emb_torch(x, cos, sin, interleaved, _inplace)
50 cos = repeat(cos, "s d -> s 1 (2 d)")
51 sin = repeat(sin, "s d -> s 1 (2 d)")
52 return torch.cat(
53 [
---> 54 x[..., :ro_dim] * cos + rotate_half(x[..., :ro_dim], interleaved) * sin,
55 x[..., ro_dim:],
56 ],
57 dim=-1,
58 )
RuntimeError: The size of tensor a (12) must match the size of tensor b (15) at non-singleton dimension 0
but i wasn't able to fix this
Also i would like to know if you plan an integration with Transformers libray enabling easier fine-tuning of the model