text icon indicating copy to clipboard operation
text copied to clipboard

Separate object construction from file reading

Open erip opened this issue 3 years ago • 2 comments

🚀 Feature

Tracking issue for this discussion. File-reading should probably be handled in a classmethod so files don't need to be written to construct certain objects.

One example would be the CLIPTokenizer, which currently accepts files. Ideally a user can provide their own merges, but to maintain the convenience of reading from a file, we could use from_pretrained or similar:

tokenizer = CLIPTokenizer.from_pretrained(encoder_json_path, vocab_bpe_path)

erip avatar Feb 22 '22 17:02 erip

Thanks @erip for creating this issue. Would be great if you can also provide some mock-code/proposals above to elaborate on the idea and give direction to the discussion/requested feature :)

parmeet avatar Feb 22 '22 17:02 parmeet

Ideally one could do something like this to avoid having to deal with differing file paths on different computers:

url_path = # Download URL for bpe merges.
clip_merges = torch.hub.load_state_dict_from_url(url_path).read().decode("utf-8").split('\n')[1:]
clip_tokenizer = torchtext.transforms.CLIPTokenizer(merges_path=clip_merges, num_merges=49152-256-2+1)

ProGamerGov avatar Feb 23 '22 16:02 ProGamerGov