text icon indicating copy to clipboard operation
text copied to clipboard

Better Documentation for `merges_path` in `CLIPTokenizer`

Open MohamedAliRashad opened this issue 2 years ago • 2 comments

📚 Documentation

Description I didn't get what i need to provide for the class for it to work and there is no documentation on what is merges_path or what it should contain (i am not specialized in NLP so my question can be little naive).

All is required is an example of what i need to provide to the class for it to work.

MohamedAliRashad avatar May 07 '22 06:05 MohamedAliRashad

This issue thread does a good job of explaining the purpose of the merge file (https://github.com/huggingface/transformers/issues/4777#issuecomment-646989260).

cc @parmeet @abhinavarora do you think it would be useful to explain the merges_path param a bit better in the CLIP tokenizer doc strings or potentially provide a link to some resource such as the GH issue above to better understand what this file contains?

Nayef211 avatar May 09 '22 20:05 Nayef211

do you think it would be useful to explain the merges_path param a bit better in the CLIP tokenizer doc strings or potentially provide a link to some resource such as the GH issue above to better understand what this file contains?

Thanks @Nayef211. Yes, I think we have minimalistic doc and we could definitely improve here. One think I am think along is adding tutorial on tokenizers. We have few of them now and providing examples on their usage along with clear description of the inputs should help users.

parmeet avatar May 16 '22 13:05 parmeet