Marcin Junczys-Dowmunt
Marcin Junczys-Dowmunt
Hi, looks like different issues. We don't validate the utf-8 correctness of your corpus, so the yaml files may be invalid, here I would say having clean corpora is on...
Cool, should be easy to check?
Is the expectation of better performance for any arch or AVX512 specific?
Hi, we need the FAISS support internally, but we can make it depend on finding MKL only?
@ykim362 very good point! All DNN with retrieval methods would rely on it.
Hm. Based on the training log I would still say there is an issue with the corpus, the training cost indicates it stops learning. Can you try with a random...
1) Unless there is something wonky going on I would not blame the SPM, but who knows. I would rather suspect that at some point you have misalignments in there,...
Do we know if the current SentencePiece version is backwards compatible with our frozen fork?
Looks like this is going to be surprisingly unproblematic. I was expecting more issues.
Yes, there are always plans and very little time :) I am want to do it eventually, but may not be able to promise anything.