darts icon indicating copy to clipboard operation
darts copied to clipboard

out of memory for different architecture(pytorch0.3)

Open skx6 opened this issue 5 years ago • 7 comments

I got different architecture through running train_search.py for many times. Some times it showed "out of memery".

skx6 avatar Jun 28 '19 13:06 skx6

I use different architectures to train classify model. sometime it shows "out of memory" .

Margrate avatar Jun 29 '19 06:06 Margrate

Hello @SongKaixiang @Margrate !

Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model.

For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size.

GL,

Catosine avatar Jul 26 '19 03:07 Catosine

I improved the code to make it compatible with PyTorch 1.1 while allowing multi-GPU training on both RNN and CNN experiments.~ you can refer: https://github.com/alphadl/darts.pytorch1.1

alphadl avatar Jul 30 '19 12:07 alphadl

Hello @SongKaixiang @Margrate !

Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model.

For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size.

GL,

Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:

  • image size: 224*224
  • #nodes in one cell: 4
  • layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.

marsggbo avatar Sep 21 '19 09:09 marsggbo

Hello @SongKaixiang @Margrate ! Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model. For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size. GL,

Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:

  • image size: 224*224
  • #nodes in one cell: 4
  • layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.

Similar condition.

image size is set to (224,224) in train_search.py, but it still returns 'out of memory' message immediately even when I set the layers to 4 and batch size to 1.

Runing Envs: Python 2.7, pytorch 0.3.1.post2, CUDA 9.0.

PS. I am using a single 2080Ti with memory of few above 11GB

rrryan2016 avatar Jan 18 '21 03:01 rrryan2016

Hello @SongKaixiang @Margrate ! Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model. For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size. GL,

Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:

  • image size: 224*224
  • #nodes in one cell: 4
  • layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.

Similar condition.

image size is set to (224,224) in train_search.py, but it still returns 'out of memory' message immediately even when I set the layers to 4 and batch size to 1.

Runing Envs: Python 2.7, pytorch 0.3.1.post2, CUDA 9.0.

PS. I am using a single 2080Ti with memory of few above 11GB

Hello @rrryan2016,

Thank you for your question. As for my case, I was using one V100 with 32G RAM. Unfortunately, DARTS is very space-consuming when searching for architectures. So you may like to try with the smaller batch size and block structure. Or even reduce some of the layer options, because, in the searching phase, the model is K times larger than the final model (for K stands for the number of operations for each layer).

Good luck and have fun:)

PF

Catosine avatar Jan 18 '21 04:01 Catosine

Or even reduce some of the layer options, because, in the searching phase, the model is K times larger than the final model (for K stands for the number of operations for each layer).

Thx for your kind reply. Really helpful.

rrryan2016 avatar Jan 18 '21 04:01 rrryan2016