esm icon indicating copy to clipboard operation
esm copied to clipboard

several problems with esmc

Open j3rk0 opened this issue 2 months ago • 2 comments

Hello everybody! today i'm trying to test ESM-C but i'm having an hard time due to different problems:

  • tokenizer initialization need ESM3 agreement. solved by doing login on huggingface hub
  • encoding sequences fail due to the absence of mask_token parameter in ESMC.tokenizer:
    sequence = sequence.replace(C.MASK_STR_SHORT, sequence_tokenizer.mask_token)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: replace() argument 2 must be str, not None

solved by manually calling tokenizer:

seq = 'AAAAAAAAAA'
res = client.tokenizer(seq,add_special_tokens=True)
ids = torch.tensor(res['input_ids'],dtype=torch.int64).to('cuda')
  • Then i called forward method of EMC class passing ids tensor but i got some mismatching dimension error inside rotary embedding:
esm/layers/rotary.py:54, in apply_rotary_emb_torch(x, cos, sin, interleaved, _inplace)
     50 cos = repeat(cos, "s d -> s 1 (2 d)")
     51 sin = repeat(sin, "s d -> s 1 (2 d)")
     52 return torch.cat(
     53     [
---> 54         x[..., :ro_dim] * cos + rotate_half(x[..., :ro_dim], interleaved) * sin,
     55         x[..., ro_dim:],
     56     ],
     57     dim=-1,
     58 )

RuntimeError: The size of tensor a (12) must match the size of tensor b (15) at non-singleton dimension 0

but i wasn't able to fix this

Also i would like to know if you plan an integration with Transformers libray enabling easier fine-tuning of the model

j3rk0 avatar Dec 06 '24 16:12 j3rk0