BERT4doc-Classification icon indicating copy to clipboard operation
BERT4doc-Classification copied to clipboard

Resource exhausted

Open rajae-Bens opened this issue 3 years ago • 5 comments

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted can someone tell me how to fix this

Also what are the expected output of this further pretraining Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

rajae-Bens avatar Dec 23 '20 08:12 rajae-Bens

Hey men, I encounter the same issue. Are you able to resolve it? I keep getting this, OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[768,768]. I already reduce the batch size to 3 but didn't work.

chen3082 avatar Jan 28 '21 10:01 chen3082

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted can someone tell me how to fix this

Also what are the expected output of this further pretraining Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

sorry for the late answer! i am not very familiar with tensorflow, but there are some suggestions:

  1. check the version of tensorflow and make sure it is 1.1x
  2. if it has OOM problems, please reduce your batch size or reduce your max sequence length. the official bert repo has provided an example: image
  3. we do not have some resources for fine-tuning with tensorflow, you can check from the official bert repo if you want

xuyige avatar Feb 19 '21 19:02 xuyige

Hey men, I encounter the same issue. Are you able to resolve it? I keep getting this, OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[768,768]. I already reduce the batch size to 3 but didn't work.

sorry for the late answer! if you have OOM problems, please reduce your batch size and max sequence length the official bert repo has provided some example with a 12G GPU

xuyige avatar Feb 19 '21 20:02 xuyige

Hi,

thank u for answering

I reduced the train_batch_size to 8 and max_seq_length to 40

but I still get the resource exhausted error

I am running the code on colab gpu 12G RAM any ideas plz

thank u

rajae-Bens avatar Mar 31 '21 08:03 rajae-Bens

Hi,

thank u for answering

I reduced the train_batch_size to 8 and max_seq_length to 40

but I still get the resource exhausted error

I am running the code on colab gpu 12G RAM any ideas plz

thank u

as your description: does your model contain some other NN modules? does your colab gpu need to share with others? do you have enough cpu sources (e.g., it is cpu OOM but not gpu OOM)?

xuyige avatar Mar 31 '21 16:03 xuyige