albert
albert copied to clipboard
Fine-tuning Albert large
I fine-tuned Albert base on my task but didn't get desired accuracy. Now that I am trying to fine-tune Albert large I get this error: "Resource exhausted: OOM when allocating tensor with shape[8,512,16,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc"
I used single GPU, with 12 GB Memory and also 16 GB (two different attempts). It is interesting that I can fine-tune Bert base on the single gpu with 12GB memory.
if i am not get wrong, albert large require big memory than bert base, it could be like requirement for bert large.
Thank you brightmart! I experienced the same thing, but even bert xlarge has a model size smaller than Bert base. It seems computationally it is more expensive but the final model size is smaller.
It is relatively computationally more expensive because it has relatively bigger architecture (number of transformer layers, hidden size etc.), therefore more computation. It has less parameters because it does parameters sharing and also decoupled the embedding size and hidden size thus reduced the embedding size tremendously and consequently reducing the parameters/model size.
If you use version 2, the memory consumption should be much smaller and you may be able to finetune the models.
I was using version 2, but I haven't succeeded to fine-tune on Albert large.
Not sure why but it seems these models are tagged as not fine-tunable
@dhruvsakalley that was due to a bug in the TF-Hub UI, which should be resolved now. Also, please switch to the "/3" path for TF-Hub modules (see Jan 7 update in readme for details).
I managed to finetune albert_large and got better result than albert_base. However, the xlarge yields unreasonable worse result. I doubt if it's due to the hyperparameter or the way I loaded the weights.