Sylvain Gugger
Sylvain Gugger
This is not a prioritized feature as you can already use TPUs for generation in Flax and TensorFlow. Since you can easily convert a model from one framework to the...
The above PR has been merged, so this should be solved :-)
Could you fix the conflicts with main first please?
@jianan-gu You do not need write access to make a rebase/merge with the main branch.
The `parallelize` API is going to be deprecated in the coming days. The way to parallelize the model is now: ```py model = AutoModelForXxx.from_pretrained(checkpoint, device_map="auto") ``` or passing an explicit...
Same naive model parallelism, and this all for inference only where the speed gain is going to be minimal. For training we recommend the use of DeepSpeed.
From what I gather of the `mup` repository, it's not general enough (yet?) to be integrated into Accelerate as it seems to be very targeted toward Transformer models, whereas Accelerate...
You're right, I should have said that the adaptations you mention seem very targeted toward Transformer (in particularly point 3 above).
You can create a randomly initialized model with `AutoModel.from_config`, with the config pulled with `AutoConfig.from_pretrained`: ```py from transformers import AutoConfig, AutoModel config = AutoConfig.from_pretrained(checkpoint_name) model = AutoModel.from_config(config) ``` As for...
> I had to add an argument _configuration_file to the model init but this is only required in transformers post v16.0 (inclusive), works without in v15.0 (think this is related...