Hieu Hoang comments

Results 13 comments of


                                            Hieu Hoang

Float16 does not work

out of interest, do you know if you're likely to get overflows when using fp16, and if you're doing anything about it * Looking for MT/NLP opportunities * Hieu Hoang...

Float16 does not work

feedback from my own work with fp16 in amun. When running on a P100 (wilkes) it gives about a 20% speedup over using fp32. Most of the speedup is in...

InputFileStream problem

Probably cos it's set up to read text On Sun, Apr 12, 2020, 6:41 PM Marcin Junczys-Dowmunt < [email protected]> wrote: > @frankseide Do you remember what this was > about?...

InputFileStream problem

you're right, it's always been opened as binary. Maybe some other bug in the previous code. I see no reason why it shouldn't work now Hieu Hoang http://statmt.org/hieu On Sun,...

Maximum vocabulary size in translator

You might want to take a look at Amun's nth_element.cu which has been changed to use the actual (target) vocab size. There's also Amun regression tests to make sure that...

What's the effect of decoder --mini-batch size?

There is some truth in this religion ![image](https://user-images.githubusercontent.com/691732/36370406-b0483aea-1556-11e8-9890-71c87712ec4c.png) Also, make sure --maxi-batch is a multiple of mini-batch, and add --maxi-batch-sort src

What's the effect of decoder --mini-batch size?

ah, there's a difference between marian and amun's maxi-batch definition then. Marian's is better imo

What's the effect of decoder --mini-batch size?

@emjotde are you using --maxi-batch to buffer input too during training? Otherwise I see no point in not sorting maxi-batches

Compile errors

I've seen unit test break on Moses when the boost library is non-standard. However, kenlm is building fine on my Ubuntu 16.04, boost Version: 1.58.0.1ubuntu1 Try uninstalling and deleting every...

DDP: why does every process allocate memory of GPU 0 and how to avoid it?

Just had the same problem and debugged it. You need to put torch.cuda.set_device(rank) before dist.init_process_group()