excord
excord copied to clipboard
About PlaceHolder Token <PH>
Hello,
I found you add a placeholder token <PH> to the tokenizer using the follow code:
tokenizer._add_tokens(["<PH>"], special_tokens=True)
tokenizer.placeholder_token = "<PH>"
And the placeholder token is used in the follow code:
encoded_ph = tokenizer.convert_tokens_to_ids(tokenizer.placeholder_token)
if len(truncated_rewrite) > len(truncated_query):
truncated_query += [encoded_ph] * (len(truncated_rewrite) - len(truncated_query))
else:
truncated_rewrite += [encoded_ph] * (len(truncated_query) - len(truncated_rewrite))
However, the index of this placeholder token has exceeded the size of the pre-trained vocabulary, so the embedding representation of this token is not available on the embedding table. How can this problem be solved? Do you need to replace placeholder token with existing tokens in the vocab? So what should I replace it with?
Hello, xyltt Thanks for the detailed question.
As you said, I added the special token to the tokenizer's vocabulary and the model would learn an embedding representation during the training phase by
model.resize_token_embeddings(len(tokenizer))
You can refer to this line. Note that I didn't use this one in the older version of Transformers library (I don't know why exactly but it worked without issues). But it is required in the current versions
Thanks for your attention to our work.
Thanks for your reply!
I also have some questions about the coqa dataset. I want to make sure that this released code is also applicable to the coqa dataset? I found that the "class_num" for coqa isn't equal to "class_num" for quac. So how many are the "class_num" for coqa? And the fourth label is ignored when calculating the "class_loss" as the following code:
else: # coqa
class_loss_fct = CrossEntropyLoss(ignore_index=3)
class_loss = class_loss_fct(class_logits, is_impossible)
I want to know why, and what is the ignored label.
I also noticed that the "class_logits" is not being used during inference for quac dataset. Is the "class_logits" used during inference for coqa dataset?