adapters icon indicating copy to clipboard operation
adapters copied to clipboard

How to load the adapter model after fine-tuning using multi-threading?

Open tanyaroosta opened this issue 3 years ago • 3 comments

Environment info

  • adapter-transformers version: 3.0.0
  • Platform: Linux-4.14.252-131.483.amzn1.x86_64-x86_64-with-glibc2.10
  • Python version: 3.8.13
  • PyTorch version (GPU?): 1.10.2+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script? yes
  • Using distributed or parallel set-up in script? yes

Details

I have loaded the pre-trained adapter model using: adapter_name = model.load_adapter('sst-2@ukp', config='pfeiffer')

Then I fine-tune the Bart model (sshleifer/distilbart-xsum-12-6) with the adapter using 8 GPUs, using the following command snippet:

python -m torch.distributed.launch --nproc_per_node=8 run_summarization.py \

However, when I use trainer.save_model(), there is no config.json file saved in the output directory regardless of whether I run in distributed fashion or not. I can use the following commands to save the model IF I am not running in the distributed fashion:

# save model
model.save_pretrained('./path/to/model/directory/')
# save adapter
model.save_adapter('./path/to/adapter/directory/', 'sst-2')

So the questions is, what is the right way to save the model along with the adapter, and subsequently load for inference, when you train your model in a distributed fashion. Something along the lines of the following snippet:

# load model
model = AutoAdapterModel.from_pretrained('./path/to/model/directory/')
model.load_adapter('./path/to/adapter/directory/')

Sorry for specifically pinging you, but you might be the best ones to answer. Thanks! @JoPfeiff @calpt @sgugger

tanyaroosta avatar May 02 '22 17:05 tanyaroosta

just another ping here since I still have not had any luck saving and loading the adapter model like the non-adapter models when training in the distributed mode. @sgugger @JoPfeiff @calpt

tanyaroosta avatar May 04 '22 17:05 tanyaroosta

Hey @tanyaroosta, if you're using our AdapterTrainer class, it will always only save & load the adapter weights after training and never the (frozen) pre-trained weights. Beyond this saving/ loading logic, AdapterTrainer is not much different from the original Trainer class built-in into HuggingFace Transformers. So, if you need to save the full model instead of the adapters, you might just switch to the latter trainer class.

Also note that when you call model.save_pretrained() on a model with adapters, it will save the full model along with the adapters (in the same file). Thus, you don't need to save adapters separately in this case.

Hope this helps.

calpt avatar May 10 '22 08:05 calpt

Working with @tanyaroosta on this. Writing our response in case someone else runs into the same issue. Per the above suggestion we tried switching back to using the Trainer class instead of AdapterTrainer but I was unable to get the adapter layer working when loading the saved model. Although I can see the adapter in the config this way, when I try to load and set the active adapter, I get an error saying there is no adapter found.

I also tried using model.save_pretrained(), but as mentioned in the original post, we are training in distributed mode and when making the model.save_pretrained() call, it throws an error because all threads are trying to save the same thing. This approach likely works when using just 1 GPU. We did not test this as it would slow down our training too much.

The approach we took that worked is to use the AdapterTrainer class, calling trainer.save_model() once training completes. Then, to load the model from saved we use the same starting model (downloaded at the beginning of our training) config, and replicate any changes made, e.g. resizing token embeddings. We can then load and set the saved adapter.

The approach we took to load the model and adapter from saved looks like this:

name = <model_name from hub or whatever source>
tokenizer = AutoTokenizer.from_pretrained(<path to model output directory>)
config = AutoConfig.from_pretrained(name)

# replace AutoModelForSeq2SeqLM for whatever model type you are using
model = AutoModelForSeq2SeqLM.from_pretrained(
    name,
    from_tf=False,
    config=config
)

# replicating any changes made to starting model found in training script 
model.resize_token_embeddings(len(tokenizer))

# loading and setting adapter
model.load_adapter(<path to adapter save directory -within model output directory>, model_name=name)
model.set_active_adapters(<name of adapter>)

scotteggs avatar May 11 '22 18:05 scotteggs

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

adapter-hub-bert avatar Oct 14 '22 06:10 adapter-hub-bert

This issue was closed because it was stale for 14 days without any activity.

adapter-hub-bert avatar Oct 29 '22 06:10 adapter-hub-bert