dynalang icon indicating copy to clipboard operation
dynalang copied to clipboard

Compiler error when running Messenger env

Open realjoenguyen opened this issue 1 year ago • 4 comments

Hi,

max_len surrounded by the following block >>>>, in def _embed, hasn't been declared yet. I am assuming it is self.max_token_seqlen?

The code is from here,


  def _embed(self, string):
    if f"{string}_{self.max_token_seqlen}" not in self.token_cache \
        or string not in self.embed_cache:
      print(string)
      tokens = self.tokenizer(string, return_tensors="pt",
                              add_special_tokens=True)  # add </s> separators
      import torch
      with torch.no_grad():
        # (seq, dim)
        embeds = self.encoder(**tokens).last_hidden_state.squeeze(0)
      self.embed_cache[string] = embeds.cpu().numpy()

>>>>>
      self.token_cache[f"{string}_{max_len}"] = {
        k: v.squeeze(0).cpu().numpy() for k, v in tokens
      }
>>>>>

    return (
      self.embed_cache[string],
      self.token_cache[f"{string}_{self.max_token_seqlen}"]
    )

realjoenguyen avatar Sep 19 '23 22:09 realjoenguyen

Sorry about that, fixed!

FYI I haven't tested this code path for a while — the default setting should use the cached embeddings from the messenger_embeds.pkl file, which will be much faster than embedding the sentences online in the environment.

jlin816 avatar Sep 20 '23 01:09 jlin816

Thank you!

And also tokens in

      self.token_cache[f"{string}_{max_len}"] = {
        k: v.squeeze(0).cpu().numpy() for k, v in tokens
      }

I assume it should be tokens.items()?

realjoenguyen avatar Sep 20 '23 01:09 realjoenguyen

Yep! Also fixed now.

jlin816 avatar Sep 21 '23 14:09 jlin816

Also is there any possibility that you can publish the trained weight? Thank you for your help!

realjoenguyen avatar Sep 21 '23 17:09 realjoenguyen