albert icon indicating copy to clipboard operation
albert copied to clipboard

Fine-tuning Albert large

Open SohaKhazaeli opened this issue 5 years ago • 8 comments

I fine-tuned Albert base on my task but didn't get desired accuracy. Now that I am trying to fine-tune Albert large I get this error: "Resource exhausted: OOM when allocating tensor with shape[8,512,16,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc"

I used single GPU, with 12 GB Memory and also 16 GB (two different attempts). It is interesting that I can fine-tune Bert base on the single gpu with 12GB memory.

SohaKhazaeli avatar Nov 10 '19 19:11 SohaKhazaeli

if i am not get wrong, albert large require big memory than bert base, it could be like requirement for bert large.

brightmart avatar Nov 12 '19 06:11 brightmart

Thank you brightmart! I experienced the same thing, but even bert xlarge has a model size smaller than Bert base. It seems computationally it is more expensive but the final model size is smaller.

SohaKhazaeli avatar Nov 12 '19 15:11 SohaKhazaeli

It is relatively computationally more expensive because it has relatively bigger architecture (number of transformer layers, hidden size etc.), therefore more computation. It has less parameters because it does parameters sharing and also decoupled the embedding size and hidden size thus reduced the embedding size tremendously and consequently reducing the parameters/model size.

riturajkunwar avatar Nov 12 '19 22:11 riturajkunwar

If you use version 2, the memory consumption should be much smaller and you may be able to finetune the models.

lanzhzh avatar Nov 15 '19 09:11 lanzhzh

I was using version 2, but I haven't succeeded to fine-tune on Albert large.

SohaKhazaeli avatar Nov 15 '19 13:11 SohaKhazaeli

Not sure why but it seems these models are tagged as not fine-tunable image

dhruvsakalley avatar Nov 24 '19 15:11 dhruvsakalley

@dhruvsakalley that was due to a bug in the TF-Hub UI, which should be resolved now. Also, please switch to the "/3" path for TF-Hub modules (see Jan 7 update in readme for details).

0x0539 avatar Jan 08 '20 22:01 0x0539

I managed to finetune albert_large and got better result than albert_base. However, the xlarge yields unreasonable worse result. I doubt if it's due to the hyperparameter or the way I loaded the weights.

hankcs avatar Jan 23 '20 19:01 hankcs