Teven
Teven
Update: this version is ~4 times slower with the change from column format to row format, compared to a hacky version that just gets the column for a certain feature....
Hey Maruf, sorry, not yet, I'm a bit swamped at the moment and the priority switched to cleaning OSCAR-ml additionally ourselves before launching anything on it, maybe @ibeltagy can review...
Hey ! The number without embeddings is actually just that: the number of non-embedding parameters, not the number of unique parameters. This is the relevant number to estimate the loss...
Ah yes, saw it on the other issue then forgot about it - I can take a look at the end of this week.
We did use mc4 for early multilingual experiments before switching to OSCAR - let's keep the code for future reference. Thanks for catching this!
Seeing this with A100 / CUDA 11.5 / faiss-gpu=1.7.2
Is there perhaps some way to compile without `doxygen` ?
`swig -version` returns 4.0.2, but maybe there's a conflicting installation issue. How can one remove the -doxygen flag ? Is it something to edit in the code, or a flag...
Passing `broadcast_buffers=False` to `DistributedDataParallel` fixed this for me. I've opened a PR at #24326 to surface that argument to the Trainer user.
Hey @tianyil1 , this looks like another issue to me, and I'm not seeing in my case. If you send your file here, it could be easier to run it...