text
text copied to clipboard
Better Documentation for `merges_path` in `CLIPTokenizer`
📚 Documentation
Description
I didn't get what i need to provide for the class for it to work and there is no documentation on what is merges_path
or what it should contain (i am not specialized in NLP so my question can be little naive).
All is required is an example of what i need to provide to the class for it to work.
This issue thread does a good job of explaining the purpose of the merge file (https://github.com/huggingface/transformers/issues/4777#issuecomment-646989260).
cc @parmeet @abhinavarora do you think it would be useful to explain the merges_path
param a bit better in the CLIP tokenizer doc strings or potentially provide a link to some resource such as the GH issue above to better understand what this file contains?
do you think it would be useful to explain the
merges_path
param a bit better in the CLIP tokenizer doc strings or potentially provide a link to some resource such as the GH issue above to better understand what this file contains?
Thanks @Nayef211. Yes, I think we have minimalistic doc and we could definitely improve here. One think I am think along is adding tutorial on tokenizers. We have few of them now and providing examples on their usage along with clear description of the inputs should help users.