d2l-zh icon indicating copy to clipboard operation
d2l-zh copied to clipboard

add chap4-multilayer-perceptrons of Paddle

Open zbp-xxxp opened this issue 2 years ago • 14 comments

Hello, we added the code based on paddlepaddle for the chapter 4 :multilayer-perceptrons.

zbp-xxxp avatar Mar 30 '22 13:03 zbp-xxxp

@cheungdaven seems like paddle is not able to find CUDA device. Similar issue in #1103

AnirudhDagar avatar Mar 30 '22 15:03 AnirudhDagar

what is the current Cuda version on the server? @AnirudhDagar

@cheungdaven seems like paddle is not able to find CUDA device. Similar issue in #1103

cheungdaven avatar Mar 30 '22 16:03 cheungdaven

We've both CUDA 11.2 and CUDA 10.2.

AnirudhDagar avatar Mar 30 '22 16:03 AnirudhDagar

We've both CUDA 11.2 and CUDA 10.2.

hello,do you need a paddlepaddle engineer to solve the problem together?

tngt avatar Apr 06 '22 01:04 tngt

We've both CUDA 11.2 and CUDA 10.2.

hello,do you need a paddlepaddle engineer to solve the problem together?

Hi @tngt , I am looking into this issue. Can we discuss this in the slack channel?

cheungdaven avatar Apr 06 '22 02:04 cheungdaven

We've both CUDA 11.2 and CUDA 10.2.

hello,do you need a paddlepaddle engineer to solve the problem together?

Hi @tngt , I am looking into this issue. Can we discuss this in the slack channel?

ok i'm online

tngt avatar Apr 06 '22 02:04 tngt

@d2l-bot please rebuild!

AnirudhDagar avatar Apr 06 '22 12:04 AnirudhDagar

Hi @tngt,

@cheungdaven and I looked into the issue. The problem here is that d2lbook intelligently only wants to assign GPUs to the notebooks which really demand GPUs. So it has various patterns it looks for to see if the notebook needs GPU. For example a notebook like https://github.com/d2l-ai/d2l-zh/blob/master/chapter_preliminaries/calculus.md doesn't really require a GPU. But paddlepaddle when installed for CUDA will always try to run everything on GPUs. In this case when such a notebook is encountered for paddle and d2lbook doesn't make the CUDA DEVICES VISIBLE, then we end up with these CUDA errors.

There are a few possible solutions:

  1. We change d2lbook to allow paddle to find GPUs for all the notebooks. This will solve the problem but may not be best in terms of resource usage in CI and parallelization of resources when building in CI.
  2. We add paddle.device.set_device('cpu') after import paddle line, in all the notebooks where GPU is not required. This one is a better solution.

Ideal way should be (like it is with all other frameworks), that, if GPUs are not found, paddle should automatically run everything on CPU. Is that possible? For now @cheungdaven installed paddlepaddle-gpu==2.2.2.post112 and with that it is not possible automatically.

cc @astonzhang

AnirudhDagar avatar Apr 08 '22 19:04 AnirudhDagar

Hi @tngt,

@cheungdaven and I looked into the issue. The problem here is that d2lbook intelligently only wants to assign GPUs to the notebooks which really demand GPUs. So it has various patterns it looks for to see if the notebook needs GPU. For example a notebook like https://github.com/d2l-ai/d2l-zh/blob/master/chapter_preliminaries/calculus.md doesn't really require a GPU. But paddlepaddle when installed for CUDA will always try to run everything on GPUs. In this case when such a notebook is encountered for paddle and d2lbook doesn't make the CUDA DEVICES VISIBLE, then we end up with these CUDA errors.

There are a few possible solutions:

  1. We change d2lbook to allow paddle to find GPUs for all the notebooks. This will solve the problem but may not be best in terms of resource usage in CI and parallelization of resources when building in CI.
  2. We add paddle.device.set_device('cpu') after import paddle line, in all the notebooks where GPU is not required. This one is a better solution.

Ideal way should be (like it is with all other frameworks), that, if GPUs are not found, paddle should automatically run everything on CPU. Is that possible? For now @cheungdaven installed paddlepaddle-gpu==2.2.2.post112 and with that it is not possible automatically.

cc @astonzhang

I tested the second solution this afternoon, it does not work anymore. I still got the same coda issue aftering specifying the device to cpu. So I think the first solution is the only feasible one so far. You can double check. @AnirudhDagar

cheungdaven avatar Apr 09 '22 01:04 cheungdaven

Hi @tngt,

@cheungdaven and I looked into the issue. The problem here is that d2lbook intelligently only wants to assign GPUs to the notebooks which really demand GPUs. So it has various patterns it looks for to see if the notebook needs GPU. For example a notebook like https://github.com/d2l-ai/d2l-zh/blob/master/chapter_preliminaries/calculus.md doesn't really require a GPU. But paddlepaddle when installed for CUDA will always try to run everything on GPUs. In this case when such a notebook is encountered for paddle and d2lbook doesn't make the CUDA DEVICES VISIBLE, then we end up with these CUDA errors.

There are a few possible solutions:

1. We change d2lbook to allow paddle to find GPUs for all the notebooks. This will solve the problem but may not be best in terms of resource usage in CI and parallelization of resources when building in CI.

2. We add `paddle.device.set_device('cpu')` after `import paddle` line, in all the notebooks where GPU is not required. This one is a better solution.

Ideal way should be (like it is with all other frameworks), that, if GPUs are not found, paddle should automatically run everything on CPU. Is that possible? For now @cheungdaven installed paddlepaddle-gpu==2.2.2.post112 and with that it is not possible automatically.

cc @astonzhang

Sorry I missed this info, we discussed this on slack with @cheungdaven and it seems he is using a different method now.

tngt avatar Apr 12 '22 09:04 tngt

@tngt just give me a moment, I'm consolidating everything (will probably make changes to the d2lbook package) and will comment on the PRs with the steps ahead. Just wanted to let you know so that you can utilise your time in a better way. Thanks for being patient :))

AnirudhDagar avatar Apr 12 '22 09:04 AnirudhDagar

@tngt just give me a moment, I'm consolidating everything (will probably make changes to the d2lbook package) and will comment on the PRs with the steps ahead. Just wanted to let you know so that you can utilise your time in a better way. Thanks for being patient :))

Thanks,I would like to know how long it will take and what does the paddle team need to do? Now I see that the chapter2 has been merged, are we going to continue PR other chapter?

tngt avatar Apr 14 '22 02:04 tngt

Hi @tngt, sorry if I didn't make it clear after my review. Yes, now that we have fixed the CUDA issue within d2lbook, please continue with other chapters. They should be fine. I'll also do a cursory review of the other open PRs.

This is the expected workflow for raising the PR:

  1. Add paddle support for a chapter by modifying the source markdown files and testing them locally.
  2. Run d2lbook build lib (You may need to install it by pip install git+https://github.com/d2l-ai/d2l-book) to save all the functions marked with #@save in d2l/paddle.py. Before sending the PR, please also commit these lib changes in paddle.py file.

Feel free to follow the steps in https://github.com/d2l-ai/d2l-en/blob/master/CONTRIBUTING.md for a detailed d2l development workflow.

AnirudhDagar avatar Apr 14 '22 21:04 AnirudhDagar

d2l lib changes are not reflected in this PR. Please run d2lbook build lib and commit the changes. You'll need to rebase this PR against the paddle branch before that. Thanks @zbp-xxxp :))

We'll have to wait for #1119 before merging this anyway. I guess you can wait until #1119 is merged and then make the requested changes.

Copy that. Thank you too ~

zbp-xxxp avatar Apr 15 '22 03:04 zbp-xxxp

Thanks. Closing this PR per https://github.com/d2l-ai/d2l-zh/pull/1186#issuecomment-1211490440

astonzhang avatar Aug 11 '22 04:08 astonzhang