marian
marian copied to clipboard
Using as little CPU RAM and cores as possible, when decoding
Is there an option to build the Marian binary such that it will use as little CPU RAM and cores as much as possible?
It looks like it's using a range of CPU RAM 1-8GB and 1-8 cores when decoding and deciding the optimal VM size is pretty hard if there's no control of how many cores or how much RAM marian is using.
Or is there an option when decoding to use specified amount of CPU RAM? Is --cpu-threads 0
the only way now?
Can you elaborate? And please post a typical command line you use.
cpu-threads 0
is basically the default, when you use gpu decoding that shouldn't do anything.
In model.npz.decoder.yml
:
models:
- model.npz
vocabs:
- vocab.src.spm
- vocab.trg.spm
beam-size: 6
normalize: 0.6
word-penalty: 0
mini-batch: 16
maxi-batch: 100
maxi-batch-sort: src
relative-paths: false
On CLI, with GPU we usually do:
~/marian-dev/build/marian-decoder \
-c model.npz.decoder.yml -m model-r0.npz \
-v vocab.src.spm vocab.trg.spm \
--mini-batch 1 --maxi-batch 100 \
-d 0 1 2 3
When we use the GPU decoding as above, somehow CPU and RAM usage runs wild and I've no idea how much it uses, causing the jobs get killed when RAM gets overloaded.
On CLI, with CPU, we usually do:
~/marian-dev/build/marian-decoder \
-c model.npz.decoder.yml -m model-r0.npz \
-v vocab.src.spm vocab.trg.spm \
--mini-batch 1 --maxi-batch 100 --num-devices 0 \
--cpu-threads 32
So do I understand your question correctly: you would like to have predictable CPU RAM/core usage during GPU decoding?
(For CPU decoding, I think setting --cpu-threads
and --workspace
should help.)
--workspace
limits the GPU RAM (sort of)
But does --workspace
work for CPU RAM too?
I think it does for the numeric tensors flowing through the network. It does not for C++-side state structures used by the decoder, e.g. the arrays of active beams and traceback.
Actually, workspace does determine per CPU thread workspace during CPU decoding. We have ways to use memory-mapped models and shared or rather re-useable workspace for CPU decoding, but that is currently not exposed anywhere as an API. We use that here:
https://github.com/marian-nmt/marian/blob/02f4af4eeefa79a24cd52d279a5d4d374423d631/src/microsoft/quicksand.h#L50
Note how we only pass a pointer and size to the decoder. The buffer has to be created outside Marian.
For memory-mapping you can again just point a pointer to a memory mapped model.bin file (you have to map yourself) and just pass the pointer again: https://github.com/marian-nmt/marian/blob/02f4af4eeefa79a24cd52d279a5d4d374423d631/src/microsoft/quicksand.cpp#L77
Binary files that can be mapped can be constructed from model.npz via marian-conv:
./marian-conv --from model.npz --to model.bin
These *.bin files can just be mmapped.
@emjotde memory mapping models and reusing workspaces sounds interesting. Is my understanding correct that this is currently not directly supported by marian-decoder
?