Marcin Junczys-Dowmunt

Results 258 comments of Marcin Junczys-Dowmunt

Hi, it might also not be worth it. If I am not wrong float16 is artifically capped in gamer hardware, i.e. the GTX 1080, to laughable performance, about ~30x slower...

Oh. In that case carry on :)

Interesting. Thing is, it should not be faster. F16 arithmetics are severly capped. We benchmarked cublas hgemm vs sgemm on a GTX1080 once, it was slower by a factor of...

Yeah, maybe on the CPU as well? Are float16 operations faster on our CPUs?

I would say two GPUs are preferable to one. With synchronous SGD the RAM in the two cards basically adds up in terms of batch size (not model size though)...

In my research branch, that is not properly merged yet. I can point you to the very experimental code. In hindsight and after more experiments I cannot currently confirm that...

I just got a couple of Voltas to play around with. So this is next big work item.

This is quite exploratory. I plan to have fp16 after Christmas, but don't take my word for it. I first need to learn how that works :)

Hi, The problem here depends on how you have implemented your document-level system. With current Marian I would say there are two ways to achieve that out-of-the-box with no to...

I also do not recommend to use NCCL with a number of GPUs that's not a power of 2. While you may get a small performance improvement going from, say,...