marian
marian copied to clipboard
tcmalloc: large alloc 2818572288 bytes == 0x33daa000 @
[2019-12-06 00:18:07] [data] Loading vocabulary from JSON/Yaml file /191206/source_vocab.yml
[2019-12-06 00:18:08] [data] Setting vocabulary size for input 0 to 328116
[2019-12-06 00:18:08] [data] Loading vocabulary from JSON/Yaml file /191206/target_vocab.yml
[2019-12-06 00:18:09] [data] Setting vocabulary size for input 1 to 225581
[2019-12-06 00:18:10] [memory] Extending reserved space to 2048 MB (device gpu0)
[2019-12-06 00:18:10] Training started
[2019-12-06 00:18:10] [data] Shuffling files
[2019-12-06 00:18:10] [data] Done reading 1754741 sentences
[2019-12-06 00:18:14] [data] Done shuffling 1754741 sentences to temp files
[2019-12-06 00:18:14] [memory] Reserving 1615 MB, device gpu0
[2019-12-06 00:18:15] [memory] Reserving 1615 MB, device gpu0
tcmalloc: large alloc 2147483648 bytes == 0x33daa000 @
tcmalloc: large alloc 2281701376 bytes == 0x33daa000 @
tcmalloc: large alloc 2415919104 bytes == 0x33daa000 @
tcmalloc: large alloc 2550136832 bytes == 0x33daa000 @
tcmalloc: large alloc 2684354560 bytes == 0x33daa000 @
tcmalloc: large alloc 2818572288 bytes == 0x33daa000 @
tcmalloc: large alloc 2952790016 bytes == 0x33daa000 @
tcmalloc: large alloc 3087007744 bytes == 0x33daa000 @
tcmalloc: large alloc 3221225472 bytes == 0x33daa000 @
tcmalloc: large alloc 3355443200 bytes == 0x33daa000 @
tcmalloc: large alloc 3489660928 bytes == 0x33daa000 @
[2019-12-06 00:18:34] [memory] Reserving 3231 MB, device gpu0
tcmalloc: large alloc 4026531840 bytes == 0x33daa000 @
tcmalloc: large alloc 4429185024 bytes == 0x33daa000 @
[2019-12-06 00:18:54] Error: CUDA error 2 'out of memory' - /marian/src/tensors/gpu/device.cu:32: cudaMalloc(&data_, size)
[2019-12-06 00:18:54] Error: Aborted from virtual void marian::gpu::Device::reserve(size_t) in /marian/src/tensors/gpu/device.cu:32
[CALL STACK]
[0xb70bb7]
[0x5d028c]
[0x66a074]```
This is a France demo I trained. Memory increases significantly during training, resulting in out-of-memory.But the same number of Germans is not expected to have this problem。What is the cause of this problem?
thanks
Could you provide the command/config you use? More details would be helpful, e.g. what is the model you use or how large is it? What is your workspace? Do you train with mini-batch-fit?
Is this the only process running on the GPU?
Hi snukky,
Thank you for your support.
./build/marian --train-sets /Marian/1_ForTrain/TagRemoved_Source_Train_2560035.tok.en /Marian/1_ForTrain/TagRemoved_Target_Train_2560035.tok.de --vocabs /Marian/source_vocab.yml /Marian/target_vocab.yml --model /Marian/pre-train_model.npz --devices 0 --dim-emb 500 --after-epochs 13 --max-length 70 --max-length-crop
The training file has 2560035 lines。I used the GTX1080ti to train this engine.
Your vocabularies are huge, is that planned? Normally we would use something 10x smaller, this explains your model size due to the embeddings matrices:
[2019-12-06 00:18:08] [data] Setting vocabulary size for input 0 to 328116
[2019-12-06 00:18:09] [data] Setting vocabulary size for input 1 to 225581
HI emjotde,
I build the vocabularies with the /build/marian-vocab command. Commas and periods don't do word breaks.
You need to tokenize your data first, I also recommend use subword-segmentation. Look at these examples:
- https://github.com/marian-nmt/marian-examples/tree/336740065d9c23e53e912a1befff18981d9d27ab/training-basics
- https://github.com/marian-nmt/marian-examples/tree/336740065d9c23e53e912a1befff18981d9d27ab/training-basics-sentencepiece
HI emjotde,
As you said, I've already done tokenize. Via ./build/marian-vocab </Marian/1_ForTrain/TagRemoved_Source_Train_3318741.tok.en> /Marian/source_vocab.yml
command,The result still contains the symbol
Subword segmentation is your best bet here. See the provided examples for either BPE or SentencePiece. Also, your tokenizer doesn't seem to be particularly good if it kept those words together.
To complement Marcin's response, replacing
--vocabs /Marian/source_vocab.yml /Marian/target_vocab.yml
with
--vocabs /Marian/source_vocab.spm /Marian/target_vocab.spm
should solve the issue, but following the examples mentioned above will allow for better understanding of data pre-processing for NMT.
I personally fixed tcmalloc: large alloc ...
by updating Cuda.
Make sure to completely remove previous installations.
Installation intructions can be found here: https://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04
Hm. The tcmalloc: large alloc ...
thing isn't really anything that needs to be fixed. It is just an unnecessary log message by Google's libtcmalloc whenever it allocates a larger (actually not that large) chunk of memory. It can be relatively safely ignored. It should also not go away from updating CUDA, these are rather unrelated.
In my case, it's not the log text that bothers. I had memory crash during the training. memory usage increases suddenly between epochs while I have enough GPU memory available.
I use a GTX 1080TI with 11GB memory. I allocate 1GB workspace and still crashes. Before crashing it shows me the tcmalloc: large alloc ...
The only fix that worked for me was updating the Cuda.
Hi! I've running into this problem. I've observed that decreasing --mini-batch
kind of mitigates the problem. But why does this problem happens? Why does the memory usage stop increasing? Does it marian apply some kind of cache or the problem is just related to fr-en model?
My command is:
/home/cgarcia/Documentos/experiment_crawling/marian/marian-dev/build/marian-decoder \
-c /home/cgarcia/Documentos/experiment_crawling/marian/students/fren/fren.student.tiny11/config.intgemm8bitalpha.yml \
--quiet --max-length-crop --cpu-threads 64 --mini-batch 8
UPDATE:
It seems that if --cpu-threads
is decreased, it kind of mitigates the problem too.