d2l-zh icon indicating copy to clipboard operation
d2l-zh copied to clipboard

Add PaddlePaddle Implementation

Open astonzhang opened this issue 2 years ago • 21 comments

astonzhang avatar Aug 26 '22 16:08 astonzhang

Initiated evaluation from scratch: http://ci.d2l.ai/blue/organizations/jenkins/d2l-zh/detail/PR-1198/1/pipeline http://ci.d2l.ai/blue/organizations/jenkins/d2l-zh/detail/PR-1198/2/pipeline/ http://ci.d2l.ai/blue/organizations/jenkins/d2l-zh/detail/PR-1198/3/pipeline/

astonzhang avatar Aug 26 '22 16:08 astonzhang

Job d2l-zh/PR-1198/3 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Aug 27 '22 00:08 d2l-bot

LGTM

Screen Shot 2022-08-27 at 4 28 26 PM

Per http://ci.d2l.ai/blue/organizations/jenkins/d2l-zh/detail/PR-1198/3/pipeline/, it looks that the paddle evaluation runs significantly longer (1h47m) than implementation with other frameworks (pytorch 38m). Could you look into the complete log (click "Show Complete Log") and identify individual notebooks that take longer to evaluate? Can you improve the efficiency?

For example:

  1. why is there this error:

Screen Shot 2022-08-27 at 4 31 31 PM

  1. In paddle complete log:
[d2lbook:resource.py:L223] INFO   Task "Evaluating ./chapter_computer-vision/image-augmentation.md" on CPU [1], GPU [1, 0] is finished in 00:12:19

In pytorch complete log:

[d2lbook:resource.py:L223] INFO   Task "Evaluating ./chapter_computer-vision/image-augmentation.md" on CPU [3], GPU [1, 3] is finished in 00:02:36

There can be more examples like the above.

astonzhang avatar Aug 27 '22 23:08 astonzhang

Job d2l-zh/PR-1198/4 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 06 '22 13:09 d2l-bot

Job d2l-zh/PR-1198/5 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 07 '22 06:09 d2l-bot

Job d2l-zh/PR-1198/6 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 07 '22 06:09 d2l-bot

Hi, @astonzhang, I would like to know how to make the CI build the whole project again? it seems that the CI only builds the files that I modified.

w5688414 avatar Sep 07 '22 08:09 w5688414

we started a CI from scratch. @w5688414

cheungdaven avatar Sep 07 '22 18:09 cheungdaven

@d2l-bot please rebuild

w5688414 avatar Sep 08 '22 01:09 w5688414

@d2l-bot please rebuild

w5688414 avatar Sep 08 '22 16:09 w5688414

Job d2l-zh/PR-1198/9 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 08 '22 16:09 d2l-bot

we started a CI from scratch. @w5688414

Hi, @cheungdaven Could you please rebuild whole projects, it seems that I can't rerun the whole project by myself.

w5688414 avatar Sep 08 '22 17:09 w5688414

@d2l-bot please rebuild

w5688414 avatar Sep 09 '22 01:09 w5688414

Job d2l-zh/PR-1198/10 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 09 '22 04:09 d2l-bot

Hi, @astonzhang , I removed multi gpu training, because paddle doesn't support multi gpu training on notebooks, so it takes about 50mins to build the paddle version's of d2l notebooks.

image

what's more, I can't reproduce the error on my local machine, it seems that the error won't affect the building procedure, is this a serivous problem? can you give me some suggestions?

--------------------------------------

C++ Traceback (most recent call last):

--------------------------------------

No stack trace in paddle, may be caused by external reasons.



----------------------

Error Message Summary:

----------------------

FatalError: `Termination signal` is detected by the operating system.

  [TimeInfo: *** Aborted at 1662692867 (unix time) try "date -d @1662692867" if you are using GNU date ***]

  [SignalInfo: *** SIGTERM (@0x3e900017699) received by PID 95900 (TID 0x7f2bcf693080) from PID 95897 ***]

w5688414 avatar Sep 09 '22 04:09 w5688414

Hi, @astonzhang , I removed multi gpu training, because paddle doesn't support multi gpu training on notebooks, so ti takes about 50mins to build the paddle version's of d2l notebooks.

image

what's more, I can't reproduce the error on my local machine, it seems that the error won't affect the building procedure, is this a serivous problem? can you give me some suggestions?

--------------------------------------

C++ Traceback (most recent call last):

--------------------------------------

No stack trace in paddle, may be caused by external reasons.



----------------------

Error Message Summary:

----------------------

FatalError: `Termination signal` is detected by the operating system.

  [TimeInfo: *** Aborted at 1662692867 (unix time) try "date -d @1662692867" if you are using GNU date ***]

  [SignalInfo: *** SIGTERM (@0x3e900017699) received by PID 95900 (TID 0x7f2bcf693080) from PID 95897 ***]

Can you follow https://github.com/d2l-ai/d2l-zh/blob/paddle/Jenkinsfile and run

pip install git+https://github.com/d2l-ai/d2l-book
d2lbook build eval --tab paddle

to try reproducing the error?

astonzhang avatar Sep 09 '22 04:09 astonzhang

@xiaotinghe Can you review this PR and make sure it doesn't affect the content within the scope of our forthcoming publication?

astonzhang avatar Sep 09 '22 04:09 astonzhang

@d2l-bot please rebuild

w5688414 avatar Sep 11 '22 05:09 w5688414

Hi, @astonzhang , I reproduced the error, the error has been ignored by setting paddle.disable_signal_handler(), Can you review this pr again?

image

Hi, @astonzhang , I removed multi gpu training, because paddle doesn't support multi gpu training on notebooks, so ti takes about 50mins to build the paddle version's of d2l notebooks. image what's more, I can't reproduce the error on my local machine, it seems that the error won't affect the building procedure, is this a serivous problem? can you give me some suggestions?

--------------------------------------

C++ Traceback (most recent call last):

--------------------------------------

No stack trace in paddle, may be caused by external reasons.



----------------------

Error Message Summary:

----------------------

FatalError: `Termination signal` is detected by the operating system.

  [TimeInfo: *** Aborted at 1662692867 (unix time) try "date -d @1662692867" if you are using GNU date ***]

  [SignalInfo: *** SIGTERM (@0x3e900017699) received by PID 95900 (TID 0x7f2bcf693080) from PID 95897 ***]

Can you follow https://github.com/d2l-ai/d2l-zh/blob/paddle/Jenkinsfile and run

pip install git+https://github.com/d2l-ai/d2l-book
d2lbook build eval --tab paddle

to try reproducing the error?

w5688414 avatar Sep 11 '22 10:09 w5688414

Job d2l-zh/PR-1198/12 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 11 '22 10:09 d2l-bot

Job d2l-zh/PR-1198/11 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Sep 11 '22 10:09 d2l-bot

@astonzhang @cheungdaven @xiaotinghe Hi,thank you very much for reviewing our submission, could we know the progress of review work? Looking forward to your reply~🤗

tngt avatar Sep 29 '22 06:09 tngt

Job d2l-zh/PR-1198/13 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Nov 08 '22 19:11 d2l-bot

@d2l-bot please rebuild

w5688414 avatar Nov 09 '22 16:11 w5688414

Job d2l-zh/PR-1198/18 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Nov 11 '22 09:11 d2l-bot

Job d2l-zh/PR-1198/20 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Nov 12 '22 01:11 d2l-bot

Job d2l-zh/PR-1198/21 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Nov 12 '22 05:11 d2l-bot

Job d2l-zh/PR-1198/22 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Nov 12 '22 06:11 d2l-bot

Job d2l-zh/PR-1198/23 is complete. Check the results at http://preview.d2l.ai/d2l-zh/PR-1198/

d2l-bot avatar Nov 12 '22 06:11 d2l-bot

@d2l-bot please rebuild.

w5688414 avatar Nov 12 '22 10:11 w5688414