Kenneth Heafield comments

Results 290 comments of


                                            Kenneth Heafield

Put your company logo on Marian website if you use Marian

Sure, just tell us your list of ~sales prospects~ clients.

How does providing a vocabulary affect training BPE models?

Can we add the word itself as a second sort criterion (or even just make the sort stable) to make it deterministic?

Cublas Error: 13

Seen on the brand-new 3090s. ``` [2021-04-27 18:45:38] Error: Cublas Error: 13 - /home/heafield/marian-dev/src/tensors/gpu/prod.cpp:118: cublasGemmEx(handle, transa, transb, m, n, k, alpha, A, CUDA_R_32F, lda, B, CUDA_R_32F, ldb, beta, C, CUDA_R_32F,...

quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input

That makes a FP32 model that's ready to be 8-bit quantized. Next step is to binarize it. https://github.com/browsermt/students/tree/master/train-student Note, due to stubbornness in marian-nmt/marian-dev#762 you won't get the best 8-bit...

quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input

There is documentation at https://github.com/browsermt/students/tree/master/train-student ; if it's unclear feel free to file an issue against that repo.

how does the configuration parameter '--quantize-bits' work?

The short answer is `quantize-bits` doesn't work when you train a model from scratch. I think it's an interesting research question to see if one could fully train a model...

Consider model variance in bootstrap resampling test

> I agree, but at the same time I think any statistical testing is probably better than none. I think a statistical test that always claims significance (and bootstrap does...

Consider oneDNN instead of MKL for SGEMM

I've only bothered to measure AVX512 but we should check. Paging @sidkashyap.

Consider oneDNN instead of MKL for SGEMM

MKL is blocking Wikipedia from deploying Marian because it is closed source.

Fetch Content for MKL?

The trick is all that registration just goes to a constant URL with no authentication: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/16849/l_mkl_2020.2.254.tgz And the "customizable" package is http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/16849/l_mkl_2020.2.254_online.tgz