collaborative-attention
collaborative-attention copied to clipboard
Code for Multi-Head Attention: Collaborate Instead of Concatenate
thank you but there have some question swap_to_collaborative(collab_model, BERTCollaborativeAdapter, dim_shared_query_key=128) I am also using the bert768 model. Is it here to change the size you want to compress? However, the...
how to apply it on BERT. Can you provide the code for using BERT in the paper
wandb: Waiting for W&B process to finish, PID 3667 Traceback (most recent call last): File "run_glue.py", line 365, in main() File "run_glue.py", line 207, in main train_dataset = GlueDataset(data_args, tokenizer=tokenizer)...
Can you guys add the number of parameters each model (in the results table) is using?
Hi @jbcdnr, @martinjaggi Thanks for this work, it's quite intuitive and easy to understand :) When I am trying to run the script provided, I am getting the following error...
Bumps [transformers](https://github.com/huggingface/transformers) from 2.11.0 to 4.30.0. Release notes Sourced from transformers's releases. v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone 100k Transformers has just reached 100k stars...