marian Using as little CPU RAM and cores as possible, when decoding

Is there an option to build the Marian binary such that it will use as little CPU RAM and cores as much as possible?

It looks like it's using a range of CPU RAM 1-8GB and 1-8 cores when decoding and deciding the optimal VM size is pretty hard if there's no control of how many cores or how much RAM marian is using.

Or is there an option when decoding to use specified amount of CPU RAM? Is --cpu-threads 0 the only way now?

Mar 12 '19 02:03 alvations

Can you elaborate? And please post a typical command line you use.

Mar 12 '19 02:03 emjotde

cpu-threads 0 is basically the default, when you use gpu decoding that shouldn't do anything.

Mar 12 '19 02:03 emjotde

In model.npz.decoder.yml:

models:
  - model.npz
vocabs:
  - vocab.src.spm
  - vocab.trg.spm
beam-size: 6
normalize: 0.6
word-penalty: 0
mini-batch: 16
maxi-batch: 100
maxi-batch-sort: src
relative-paths: false

On CLI, with GPU we usually do:

~/marian-dev/build/marian-decoder \
-c model.npz.decoder.yml -m model-r0.npz \
-v vocab.src.spm vocab.trg.spm \
--mini-batch 1 --maxi-batch 100 \
-d 0 1 2 3

When we use the GPU decoding as above, somehow CPU and RAM usage runs wild and I've no idea how much it uses, causing the jobs get killed when RAM gets overloaded.

On CLI, with CPU, we usually do:

~/marian-dev/build/marian-decoder \
-c model.npz.decoder.yml -m model-r0.npz \
-v vocab.src.spm vocab.trg.spm \
--mini-batch 1 --maxi-batch 100 --num-devices 0 \
--cpu-threads 32

Mar 12 '19 04:03 alvations

So do I understand your question correctly: you would like to have predictable CPU RAM/core usage during GPU decoding?

(For CPU decoding, I think setting --cpu-threads and --workspace should help.)

Mar 12 '19 15:03 frankseide

--workspace limits the GPU RAM (sort of)

But does --workspace work for CPU RAM too?

Mar 12 '19 18:03 alvations

I think it does for the numeric tensors flowing through the network. It does not for C++-side state structures used by the decoder, e.g. the arrays of active beams and traceback.

Mar 12 '19 22:03 frankseide

Actually, workspace does determine per CPU thread workspace during CPU decoding. We have ways to use memory-mapped models and shared or rather re-useable workspace for CPU decoding, but that is currently not exposed anywhere as an API. We use that here:

https://github.com/marian-nmt/marian/blob/02f4af4eeefa79a24cd52d279a5d4d374423d631/src/microsoft/quicksand.h#L50

Note how we only pass a pointer and size to the decoder. The buffer has to be created outside Marian.

For memory-mapping you can again just point a pointer to a memory mapped model.bin file (you have to map yourself) and just pass the pointer again: https://github.com/marian-nmt/marian/blob/02f4af4eeefa79a24cd52d279a5d4d374423d631/src/microsoft/quicksand.cpp#L77

Binary files that can be mapped can be constructed from model.npz via marian-conv:

./marian-conv --from model.npz --to model.bin

These *.bin files can just be mmapped.

Mar 21 '19 21:03 emjotde

@emjotde memory mapping models and reusing workspaces sounds interesting. Is my understanding correct that this is currently not directly supported by marian-decoder?

Sep 10 '19 16:09 frzme

marian marian copied to clipboard

Using as little CPU RAM and cores as possible, when decoding

marian
marian copied to clipboard