ditto icon indicating copy to clipboard operation
ditto copied to clipboard

Adding custom tokens

Open ajaybabu20 opened this issue 3 years ago • 0 comments

Hey guys ! I had fun reading the paper and thanks for open-sourcing the model.

In the paper, you guys mentioned where [COL] and [VAL] are special tokens for indicating the start of attribute names and values respectively. Meaning that [COL] and [VAL] are special tokens that are to be added to the tokenizer. In the repo https://github.com/megagonlabs/ditto/blob/master/ditto_light/dataset.py#L12, you guys are not adding this as special tokens to the vocabulary of the pre-trained tokenizer.

Any reason why?

ajaybabu20 avatar Oct 28 '22 14:10 ajaybabu20