Maximiliana Behnke
Maximiliana Behnke
Hi, I'm trying to run BlocksparseMatMul with feature_axis=1 on GeForce RTX 2080 Ti that supports fp16, but I get: `Gated blocksparse matmul currently only supported on fp16 tensorcores. [[node BlocksparseMatMul_000000...
### Bug description I'm trying to generate a node with normal distribution but it fails on both GPU and CPU. ``` [2021-06-03 13:53:11] Error: Curand error 105 - ./marian-pruned/src/tensors/rand.cpp:106: curandGenerateNormal(generator_,...
## 🐛 Bug I'm trying to open and investigate NLLB MoE model (405GB), but can't load it into torch. Smaller dense models seem to load fine, can access the checkpoint's...
Hi, have you guys considered adding a support for Mixture-of-Experts models? They're usually quite hefty in terms of size and would be a great opportunity to have them offload parameters...