Nikolay Bogoychev

Results 75 comments of Nikolay Bogoychev

> I guess having it doesn't hurt. Did you have a use case? We're expecting some hardware that isn't made by nvidia.

@emjotde yeah I looked at it, that's what I expect to be more efficient, but due to differences between MPI and NCCL, implementing it is a bit more complicated and...

MPI communicator offers a subset of the functionality that NCCL does. When (eventually) we want to allow multiple threads per process, the implementation of the two big collective operation functions...

@ugermann , practically impossible to achieve due to poor compiler support. Here's a subset of bugs related to multiversioning: from GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90129 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90260 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57378 (especially annoying since it's more than...

@ykim362 in the case of FBGEMM https://github.com/marian-nmt/FBGEMM/blob/84e66a976046180187724aff60a236c5378fde7c/src/Utils.cc#L184 you have an if statement which is evaluated every time the function is executed. You can argue that the compiler should inline this...

@milipili in general I agree with you, but it is possible to avoid function multiversioning and use funciton pointer initialized at runtime, based on the appropriate architecture. It just requires...

This is a new feature in cuda 10.1, which was released long after this code was written. I think it's worth investigating

Knight's landing doesn't support the necessary avx512 instructions for int16. We need to fall back to avx2 here. We have proper guards for that in intgemm.

Knl is essentially an array of intel atom processors. Single threaded performance is abysmal for any workload. I don't think marian on Knl can ever be useful.

@emjotde there is proper openmp support but as you know, openmp parallelism is not always helpful. We simply don't have big enough matrices to take advantage of it. @kpu it's...