How to load the adapter model after fine-tuning using multi-threading?
Environment info
adapter-transformersversion: 3.0.0- Platform: Linux-4.14.252-131.483.amzn1.x86_64-x86_64-with-glibc2.10
- Python version: 3.8.13
- PyTorch version (GPU?): 1.10.2+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script? yes
- Using distributed or parallel set-up in script? yes
Details
I have loaded the pre-trained adapter model using:
adapter_name = model.load_adapter('sst-2@ukp', config='pfeiffer')
Then I fine-tune the Bart model (sshleifer/distilbart-xsum-12-6) with the adapter using 8 GPUs, using the following command snippet:
python -m torch.distributed.launch --nproc_per_node=8 run_summarization.py \
However, when I use trainer.save_model(), there is no config.json file saved in the output directory regardless of whether I run in distributed fashion or not. I can use the following commands to save the model IF I am not running in the distributed fashion:
# save model
model.save_pretrained('./path/to/model/directory/')
# save adapter
model.save_adapter('./path/to/adapter/directory/', 'sst-2')
So the questions is, what is the right way to save the model along with the adapter, and subsequently load for inference, when you train your model in a distributed fashion. Something along the lines of the following snippet:
# load model
model = AutoAdapterModel.from_pretrained('./path/to/model/directory/')
model.load_adapter('./path/to/adapter/directory/')
Sorry for specifically pinging you, but you might be the best ones to answer. Thanks! @JoPfeiff @calpt @sgugger
just another ping here since I still have not had any luck saving and loading the adapter model like the non-adapter models when training in the distributed mode. @sgugger @JoPfeiff @calpt
Hey @tanyaroosta, if you're using our AdapterTrainer class, it will always only save & load the adapter weights after training and never the (frozen) pre-trained weights. Beyond this saving/ loading logic, AdapterTrainer is not much different from the original Trainer class built-in into HuggingFace Transformers. So, if you need to save the full model instead of the adapters, you might just switch to the latter trainer class.
Also note that when you call model.save_pretrained() on a model with adapters, it will save the full model along with the adapters (in the same file). Thus, you don't need to save adapters separately in this case.
Hope this helps.
Working with @tanyaroosta on this. Writing our response in case someone else runs into the same issue.
Per the above suggestion we tried switching back to using the Trainer class instead of AdapterTrainer but I was unable to get the adapter layer working when loading the saved model. Although I can see the adapter in the config this way, when I try to load and set the active adapter, I get an error saying there is no adapter found.
I also tried using model.save_pretrained(), but as mentioned in the original post, we are training in distributed mode and when making the model.save_pretrained() call, it throws an error because all threads are trying to save the same thing. This approach likely works when using just 1 GPU. We did not test this as it would slow down our training too much.
The approach we took that worked is to use the AdapterTrainer class, calling trainer.save_model() once training completes. Then, to load the model from saved we use the same starting model (downloaded at the beginning of our training) config, and replicate any changes made, e.g. resizing token embeddings. We can then load and set the saved adapter.
The approach we took to load the model and adapter from saved looks like this:
name = <model_name from hub or whatever source>
tokenizer = AutoTokenizer.from_pretrained(<path to model output directory>)
config = AutoConfig.from_pretrained(name)
# replace AutoModelForSeq2SeqLM for whatever model type you are using
model = AutoModelForSeq2SeqLM.from_pretrained(
name,
from_tf=False,
config=config
)
# replicating any changes made to starting model found in training script
model.resize_token_embeddings(len(tokenizer))
# loading and setting adapter
model.load_adapter(<path to adapter save directory -within model output directory>, model_name=name)
model.set_active_adapters(<name of adapter>)
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
This issue was closed because it was stale for 14 days without any activity.