gluon-nlp icon indicating copy to clipboard operation
gluon-nlp copied to clipboard

[Tutorial] add KoBERT tutorial

Open jamiekang opened this issue 4 years ago • 67 comments

added kobert_naver_movie for KoBERT tutorial.

Description

(Brief description on what this PR is about)

Checklist

Essentials

  • [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage
  • [ ] Code is well-documented

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

jamiekang avatar May 12 '20 09:05 jamiekang

Hello Jiyang, would you please merge the latest commit to your branch for this pull request? #1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built. Thanks!

chenw23 avatar May 13 '20 23:05 chenw23

Hello Jiyang, would you please merge the latest commit to your branch for this pull request? #1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built. Thanks!

Hello, my branch is v0.9.x and I made the lastest commit to that branch. I don't have any other branches. Can you tell me which steps are more required? Thanks.

jamiekang avatar May 14 '20 00:05 jamiekang

Hello Jiyang, would you please merge the latest commit to your branch for this pull request? #1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built. Thanks!

Hello, my branch is v0.9.x and I made the lastest commit to that branch. I don't have any other branches. Can you tell me which steps are more required? Thanks.

Hello, this commit is merged into v0.9.x branch yesterday and it seems that your pull request is opened 2 days ago. So maybe your pull request is not including this commit?

chenw23 avatar May 14 '20 00:05 chenw23

Sorry but I wonder whether there is actual need for merging into v0.9.x(release branch) rather than the master(develop branch)? I am noticing the gpu-doc failures. On master branch there are some new features that might improve the stability of doc build and help us debugging errors.

chenw23 avatar May 14 '20 01:05 chenw23

Hello, I think you need to change the pull request target branch to dmlc:master. Currently you are still targeting dmlc:v0.9.x Thanks!

chenw23 avatar May 14 '20 02:05 chenw23

Hello Jiyang, One of the test is failing due to unclear errors. Please wait patiently while we are working on the fixes.

@leezu This gpu doc test cannot pass doctest. But it seems that this error is due to a connection error. Maybe we need to do some changes elsewhere?

chenw23 avatar May 14 '20 09:05 chenw23

any update?

jamiekang avatar May 27 '20 21:05 jamiekang

any update?

Hello Jiyang, would you please merge the latest master branch into your pull request, especially to include #1236 ? So that gpu-doc can pass. Thanks!

chenw23 avatar May 30 '20 03:05 chenw23

Is this okay?

Merge pull request #1 from dmlc/master … 19c4045

jamiekang avatar Jun 02 '20 00:06 jamiekang

Is this okay?

Merge pull request #1 from dmlc/master … 19c4045

yes

avinashsai avatar Jun 02 '20 08:06 avinashsai

@leezu any idea why the err log is missing?

fatal error: An error occurred (404) when calling the HeadObject operation: Key "batch/PR-1230/14/docs/examples/sentiment_analysis/kobert_naver_movie.stderr.log" does not exist

I think this is because the ci/batch/submit-job.py failed. This failure is due to the failure of ci/batch/docker/gluon_nlp_job.sh The failure above is due to the failure of docs/md2ipynb.py

So the root cause of this failure is that the conversion of the newly added md file to the ipynb file didn't succeed.

chenw23 avatar Jun 05 '20 02:06 chenw23

@jamiekang I have investigated into the reason of the CI failure and found out that the failure should be due to the execution of the python code in your kobert_naver_movie.md file. Please see the log below:

ubuntu@ip-172-31-9-37:~$ python3 /home/ubuntu/jamiekang/gluon-nlp/docs/md2ipynb.py /home/ubuntu/jamiekang/gluon-nlp/docs/examples/sentiment_analysis/kobert_naver_movie.md
Traceback (most recent call last):
  File "/home/ubuntu/jamiekang/gluon-nlp/docs/md2ipynb.py", line 36, in <module>
    notedown.run(notebook, timeout)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/notedown/notedown.py", line 48, in run
    notebook, resources = executor.preprocess(notebook, resources={})
  File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
    reply, outputs = self.run_cell(cell, cell_index, store_history)
  File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 571, in run_cell
    if self._passed_deadline(deadline):
  File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 541, in _passed_deadline
    self._handle_timeout()
  File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 504, in _handle_timeout
    raise TimeoutError("Cell execution timed out")
TimeoutError: Cell execution timed out

So can you successfully execute the code in the kobert_naver_movie.md file locally? If you can successfully execute the code locally, can you successfully build the website as instructed by this section in the configuration page?

chenw23 avatar Jun 18 '20 15:06 chenw23

And since I am seeing timeout errors, how long does it take to do the computation of this single file on your test machine? What about the configuration of your machine? Have you tried on the g4dn.xlarge server machine?

chenw23 avatar Jun 18 '20 15:06 chenw23

And since I am seeing timeout errors, how long does it take to do the computation of this single file on your test machine? What about the configuration of your machine? Have you tried on the g4dn.xlarge server machine?

I ran the notebook in g4dn.xlarge instance. Currently this notebook runs 5 epochs and it took about 5 hours. It this too long? I can modify it to run only 1 epoch (~ 1hr), which will be fine for the purpose of tutorial. Will it be ok?

jamiekang avatar Jun 19 '20 05:06 jamiekang

I think it still cannot be successfully built on our current settings even though your code requires 1 hour.

The setting is here:

https://github.com/dmlc/gluon-nlp/blob/master/docs/md2ipynb.py#L15

and now we set

timeout = 40 * 60

which is 40 minutes.

@eric-haibin-lin @leezu What do you think? Do you think we should lift the timing restrictions?

chenw23 avatar Jun 19 '20 05:06 chenw23

@StrayBird-ATSH yes, let's lift the limit given that the process is parallelized.

szha avatar Jun 19 '20 22:06 szha

@jamiekang So can you decrease the number of epochs, and then change the time out as the line I specified above, to some length that is not too long but can permit the finish of your code?

chenw23 avatar Jun 20 '20 03:06 chenw23

I decreased the number of epochs to 1. But where should I change the timeout setting?

jamiekang avatar Jun 23 '20 02:06 jamiekang

I decreased the number of epochs to 1. But where should I change the timeout setting?

No worries, I have made it. Let's wait for @1252

chenw23 avatar Jun 23 '20 04:06 chenw23

Hi, what can we do next?

jamiekang avatar Jun 24 '20 22:06 jamiekang

Hi, what can we do next?

My recent commit to master branch is CI failing. Let me take some time to have a look and fix it.

chenw23 avatar Jun 24 '20 23:06 chenw23

I think it's fine now. Would you please merge the latest master branch into this pull request so as to get my updates? Thanks!

chenw23 avatar Jun 25 '20 23:06 chenw23

For the current failure, it seems that we have encountered something very similar to #1236. Let me dig into the reason and fix it

chenw23 avatar Jun 29 '20 03:06 chenw23

I just uploaded a modified version. (f873a1e) My GitHub Desktop also added 40d73b9 which I'm not aware of.

jamiekang avatar Jul 02 '20 22:07 jamiekang

Hello Jiyang, I ran your codes and got the following output on a g4dn.xlarge machine:

=== Finished evaluation in 2983.597782 sec

which is approximately 50 minutes. I think your code is fine since I got the ipynb output.

chenw23 avatar Jul 05 '20 14:07 chenw23

@jamiekang Hello Jiyang, I think preparation works are done and merged to master branch. Please merge the latest master branch to your pull request to get #1255 and #1256 Hope this will help your pull request pass CI. Thanks!

chenw23 avatar Jul 07 '20 22:07 chenw23

Copies error log here, seems to be something related to md file format. Cause investigating:

[2020-07-10T04:35:27.900Z] Warning, treated as error:
[2020-07-10T04:35:27.900Z] /var/lib/jenkins/gluon-nlp-cpu-py3-master/docs/examples/sentiment_analysis/kobert_naver_movie.ipynb:Could not lex literal_block as "python". Highlighting skipped.

chenw23 avatar Jul 10 '20 07:07 chenw23

Copies error log here, seems to be something related to md file format. Cause investigating:

[2020-07-10T04:35:27.900Z] Warning, treated as error:
[2020-07-10T04:35:27.900Z] /var/lib/jenkins/gluon-nlp-cpu-py3-master/docs/examples/sentiment_analysis/kobert_naver_movie.ipynb:Could not lex literal_block as "python". Highlighting skipped.

Let me know if you figure out what is the root cause. Thanks.

jamiekang avatar Jul 10 '20 09:07 jamiekang

Any update?

jamiekang avatar Jul 15 '20 01:07 jamiekang

Anything to help?

jamiekang avatar Jul 22 '20 23:07 jamiekang