computervision-recipes icon indicating copy to clipboard operation
computervision-recipes copied to clipboard

[BUG]During the process of the train, it occurs the problem of OOM

Open shanyun123456 opened this issue 4 years ago • 7 comments

Description

Hi, @soumyadeepdey When I use your code to train a model in gpu, it seems always occer the problem of OOM. The batchsize is 1, other parameters haven't changed. Please check it,thanks

In which platform does it happen?

linux gpu

How do we replicate the issue?

You can use the command python3 sample_train.py, you can replicate the issue.

Expected behavior (i.e. solution)

Other Comment

截屏2021-10-14 下午2 18 37 s 截屏2021-10-14 下午2 19 49

shanyun123456 avatar Oct 14 '21 08:10 shanyun123456

What is the memory size of the GPU you are using?

On Thu, Oct 14, 2021, 1:52 PM shanyun123456 @.***> wrote:

Description

Hi, @soumyadeepdey https://github.com/soumyadeepdey When I use your code to train a model in gpu, it seems always occer the problem of OOM. The batchsize is 1, other parameters haven't changed. Please check it,thanks In which platform does it happen? linux gpu How do we replicate the issue? You can use the command python3 sample_train.py, you can replicate the issue. Expected behavior (i.e. solution) Other Comment

[image: 截屏2021-10-14 下午2 18 37] https://user-images.githubusercontent.com/88327139/137262946-737e0f53-b5f6-44b7-832b-0952a573c60a.png s

[image: 截屏2021-10-14 下午2 19 49] https://user-images.githubusercontent.com/88327139/137262993-ca07462d-b265-4146-a7aa-65ecfea1972b.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKP32VW32O5WQUHPUJY3UG2HLLANCNFSM5F7BOQPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

soumyadeepdey avatar Oct 16 '21 08:10 soumyadeepdey

Hi, the memory size of my gpu is about 29g It's V100 gpu

shanyun123456 avatar Oct 16 '21 14:10 shanyun123456

Hello, I also used the same gpu, but never faced OOM error.

However, you can add these two lines to your code to reduce the memory footprint.

os.environ["CUDA_VISIBLE_DEVICES"]='0' gpu_devices = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(gpu_devices[0], True)

Thanks Soumyadeep

On Sat, Oct 16, 2021 at 7:35 PM shanyun123456 @.***> wrote:

Hi, the memory size of my gpu is about 29g It's V100 gpu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661#issuecomment-944920307, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKPZYHPPYA6F535DRKBLUHGBBNANCNFSM5F7BOQPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Thanks and Regards Soumyadeep Dey mob no : +919433715948

soumyadeepdey avatar Oct 19 '21 06:10 soumyadeepdey

I also meet the issue

dongcin avatar Mar 28 '22 08:03 dongcin

What is input image size and batch size you are using?

On Mon, Mar 28, 2022, 1:52 PM dongcin @.***> wrote:

I also meet the issue

— Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661#issuecomment-1080342317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKPYENUDK6UWB4VLC6SDVCFT2VANCNFSM5F7BOQPA . You are receiving this because you were mentioned.Message ID: @.***>

soumyadeepdey avatar Mar 28 '22 13:03 soumyadeepdey