text2text icon indicating copy to clipboard operation
text2text copied to clipboard

QG not working properly for other langauges.

Open mirfan899 opened this issue 2 years ago • 10 comments

Hi, I tried to generate question for Arabic and Urdu language and it seems small model cannot fit into memory to generate question. It runs for a long time and then runtime crashes most of the time but few time worked.

import text2text as t2t
t2t.Transformer.PRETRAINED_TRANSLATOR = "facebook/m2m100_418M" #Remove this line for the larger model
h = t2t.Handler(["حکومت اور کالعدم تحریک طالبان پاکستان کی جانب سے مذاکرات میں کسی بھی پیش رفت کے بارے میں آگاہ نہیں کیا جا رہا اور استفسار کے باوجود متعلقہ وزرا خاموشی اختیار کیے ہوئے ہیں۔"], src_lang="ur")
h.tokenize()
h.question()

Here is the log of crash



Dec 4, 2021, 1:19:02 PM | WARNING | WARNING:root:kernel b089d5ac-c179-45fc-aae7-a1cd3fc13344 restarted
-- | -- | --
Dec 4, 2021, 1:19:02 PM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 4, 2021, 1:08:56 PM | WARNING | tcmalloc: large alloc 1242218496 bytes == 0x556f760fc000 @ 0x7f07e19221e7 0x556f0fd30f98 0x556f0fcfbe27 0x556f0fcfde20 0x556f0fcff2ed 0x556f0fdf0e1d 0x556f0fd72e99 0x556f0fc3fd14 0x556f0fdf0f31 0x556f0fe1e849 0x556f0fd6ea7d 0x556f0fd6d9ee 0x556f0fd0148c 0x556f0fd01698 0x556f0fd6ffe4 0x556f0fdf1c66 0x556f0fd6edaf 0x556f0fd6d9ee 0x556f0fd00bda 0x556f0fd6e915 0x556f0fd6d9ee 0x556f0fd0148c 0x556f0fd01698 0x556f0fd6ffe4 0x556f0fd6d9ee 0x556f0fd0148c 0x556f0fd01698 0x556f0fd6ffe4 0x556f0fd6d9ee 0x556f0fd00bda 0x556f0fd6ec0d
Dec 4, 2021, 1:08:01 PM | INFO | Adapting to protocol v5.1 for kernel b089d5ac-c179-45fc-aae7-a1cd3fc13344
Dec 4, 2021, 1:07:59 PM | INFO | Kernel started: b089d5ac-c179-45fc-aae7-a1cd3fc13344
Dec 4, 2021, 1:03:31 PM | INFO | Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Dec 4, 2021, 1:03:31 PM | INFO | http://172.28.0.12:9000/
Dec 4, 2021, 1:03:31 PM | INFO | The Jupyter Notebook is running at:
Dec 4, 2021, 1:03:31 PM | INFO | 0 active kernels
Dec 4, 2021, 1:03:31 PM | INFO | Serving notebooks from local directory: /

QG almost take 1 minute to generate question if not crashed.

mirfan899 avatar Dec 04 '21 08:12 mirfan899

Thanks for reporting this issue. What system or platform are you running text2text on? I ran the following in the colab notebook without issues. Each prediction took maybe 30 seconds at most.

import text2text as t2t
t2t.Transformer.PRETRAINED_TRANSLATOR = "facebook/m2m100_418M" #Remove this line for the larger model
h = t2t.Handler(["حکومت اور کالعدم تحریک طالبان پاکستان کی جانب سے مذاکرات میں کسی بھی پیش رفت کے بارے میں آگاہ نہیں کیا جا رہا اور استفسار کے باوجود متعلقہ وزرا خاموشی اختیار کیے ہوئے ہیں۔"], src_lang="ur")
h.question()
h.question()
h.question()

Note that it is not necessary to tokenize prior.

We are also researching ways to reduce the memory consumption and improve speed. If you are interested in learning more and possibly contributing, take a look at https://github.com/artitw/text2text/issues/27

artitw avatar Dec 04 '21 20:12 artitw

I'm running the colab example.

mirfan899 avatar Dec 05 '21 05:12 mirfan899

What is your use-case application? Perhaps there is a way to deal with the memory issues and achieve what you are trying to do.

artitw avatar Dec 06 '21 00:12 artitw

I want to generate questions for different languages. Then I will train the QA model for those languages.

mirfan899 avatar Dec 06 '21 08:12 mirfan899

Then perhaps a short-term solution would be to keep appending results to a file and re-run the script everytime it crashes. You could also try clearing the memory after every iteration using Python's del to avoid the OOM crashes.

For the long term, let me know if you are interested in contributing to improvements to address this issue.

artitw avatar Dec 07 '21 04:12 artitw

Okay sure.

mirfan899 avatar Dec 07 '21 04:12 mirfan899

For the latter, please comment on https://github.com/artitw/text2text/issues/27 and ask @johnanisere about helping out

artitw avatar Dec 07 '21 05:12 artitw

@mirfan899 can you try updating to the latest release to see if it is any faster. There was a GPU fix recently: https://github.com/artitw/text2text/pull/31

artitw avatar Feb 17 '22 03:02 artitw

@mirfan899 can you try updating to the latest release to see if it is any faster. There was a GPU fix recently: https://github.com/artitw/text2text/pull/31

artitw avatar Feb 17 '22 03:02 artitw

Okay sure.

mirfan899 avatar Feb 17 '22 06:02 mirfan899