marian icon indicating copy to clipboard operation
marian copied to clipboard

Excessive memory usage during ensemble decoding

Open kellymarchisio opened this issue 5 years ago • 4 comments

We're ensemble decoding with 4 models using this command:

$MARIAN/build/marian-decoder \
      -c $configs \
      -m $models -d $GPU \
      --mini-batch 16 --maxi-batch 100 --maxi-batch-sort src -w 6500 \
      --n-best --beam-size 12 \
       < $test_file.bpe.$SRC > $output.$year.nbest.0

The following is a screenshot of memory usage until about like 4800 of the WMT19 test set marian_start_45

After about 4800 lines, memory usage skyrockets as seen below: marian_dead2

Do you know why this is happening and how we can fix? We're using [marian] Marian v1.7.6 02f4af4e 2018-12-12 18:51:10 -0800.

kellymarchisio avatar Jun 13 '19 21:06 kellymarchisio

Hi, Can you repeat with current master? Is that the test set with the test suites?

From: Kelly Marchisio Sent: Thursday, June 13, 2019 2:06 PM To: marian-nmt/marian Cc: Subscribed Subject: [marian-nmt/marian] Excessive memory usage during ensemble decoding(#274)

We're ensemble decoding with 4 models using this command: $MARIAN/build/marian-decoder
-c $configs
-m $models -d $GPU
--mini-batch 16 --maxi-batch 100 --maxi-batch-sort src -w 6500
--n-best --beam-size 12
< $test_file.bpe.$SRC > $output.$year.nbest.0 The following is a screenshot of memory usage until about like 4800 of the WMT19 test set

After about 4800 lines, memory usage skyrockets as seen below:

Do you know why this is happening and how we can fix? We're using [marian] Marian v1.7.6 02f4af4 2018-12-12 18:51:10 -0800. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

emjotde avatar Jun 14 '19 03:06 emjotde

It is the test set with the test suites. There has only been one code change since the version we have installed, and that was to update an acknowledgement. But I'll try this afternoon on master nonetheless

kellymarchisio avatar Jun 14 '19 15:06 kellymarchisio

I've tried it with the current master -- same issue. I finally got it to translate by splitting the file to individual lines.

Looks like line 4977 in the test set is causing our problem.

Bruchsal, Ettlingen, Freudenstadt, Karlsruhe-Stadt, Leonberg, Mühlacker, Singen, Tuttlingen, Cham, Fürstenfeldbruck, München-Körperschaften, München II, IV und München V, Starnberg, Hamburg-Altona, Hamburg-Hansa, Hamburg-Mitte, Hamburg-Nord und Hamburg für Großunternehmen, Hannover-Nord, Winsen-Luhe, Bonn-Innenstadt, Köln-Mitte, Köln-Porz, Köln-Süd, Sankt Augustin, Pirmasens, Bautzen, Chemnitz-Süd, Dresden II, Freiberg, Leipzig I und Zschopau.

Our ensemble is making very long, unlikely predictions for it. Top prediction:

Bru@@ ch@@ sal , E@@ tt@@ lingen , Fre@@ u@@ den@@ stadt , Karl@@ s@@ ruhe @-@ Stadt , Leon@@ berg , Müh@@ l@@ acker , S@@ ingen , T@@ utt@@ lingen , Cha@@ m , Für@@ sten@@ fel@@ d@@ bru@@ ck , München @-@ K@@ örperschaften , München II , IV und München V , Star@@ n@@ berg , Hamburg @-@ Al@@ ton@@ a , Hamburg @-@ H@@ ansa , Hamburg @-@ Mitte , Hamburg @-@ Nord und Hamburg für Großunternehmen , H@@ anno@@ ver @-@ Nord , W@@ ins@@ en @-@ Lu@@ he , Köln @-@ Mitte , Köln @-@ Inn@@ enst@@ adt , Köln , Köln @-@ Mitte , Hamburg @-@ Nord , Hamburg @-@ Mitte , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Mitte , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Mitte , Hamburg @-@ Mitte , Hamburg @-@ Mitte , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Mitte , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Mitte , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Hamburg @-@ Nord , Berlin @-@ Nord , Köln @-@ Nord , Köln @-@ Nord , Köln @-@ Mitte , Köln @-@ Nord , Köln @-@ Mitte , Köln @-@ Mitte , Köln , Köln , Köln @-@ Mitte , Berlin , Köln @-@ Mitte , Köln @-@ Mitte , Köln , Köln , Berlin , Berlin , Köln , Köln @-@ Mitte , Köln , Köln @-@ Mitte , Köln @-@ Mitte , Köln @-@ Mitte , Berlin , Hamburg @-@ Mitte , Hamburg @-@ Mitte , Berlin , Köln @-@ Mitte , Hamburg @-@ Mitte , Köln @-@ Mitte , Hamburg @-@ Mitte , Hamburg @-@ Mitte , Köln @-@ Mitte , Köln , Köln , Köln , Köln , Köln , Berlin , Berlin , Berlin , Berlin , Berlin , Berlin , Berlin , Berlin , Berlin , Berlin , Hamburg @-@ Mitte , Hamburg @-@ Mitte , Frankfurt ||| F0= -689.575 F1= -600.15 F2= -621.736 F3= -717.398 ||| -76.6552

When translating this single line in its own file, sometimes it works nearly immediately, and sometimes I see this large alloc: Screenshot 2019-06-14 at 14 22 10

kellymarchisio avatar Jun 14 '19 16:06 kellymarchisio

The large alloc is not that large though. Can you make the setup available for testing? I am mostly curious about the normal CPU RAM memory.

emjotde avatar Jun 15 '19 01:06 emjotde