Mark Franey
Mark Franey
It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies
Epsilon sampling is a compelling alternative/complement to top_p and top_k sampling and would make a good addition to CTranslate2: https://arxiv.org/abs/2305.09860
The wiki suggests a batch size of 128 is recommended for 'stable training'. It would be helpful to have the option to accumulate gradients so that bicleaner-ai training with larger...