Guillaume Klein
Guillaume Klein
Did you also consider training a Transformer model with a reduced number of decoder layers? CTranslate2 can run models with a different number of encoder and decoder layers.
#901 adds new methods that implement this approach.
For reference, I tried a quick integration of cublasLt but I found the performance to be worse than the current implementation, even with fused bias and ReLU. To be investigated....
This should be easy to add. Do you think this feature is useful right now even though GPT-J and GPT Neo are not yet supported?
Hi, What custom model architecture are you referring to? Can you post the Fairseq options that are used? If the model architecture is close to a standard Transformer, we may...
There are 2 difficulties to fully integrate this model: * it uses convolutional layers but our library currently does not have such primitives (they should be implemented) * the library...
Hi, Currently this is not possible. Can you describe a bit more these adapter layers? Is there a public paper or implementation for reference?
Hi, Currently there is no converter for Tensor2Tensor, but it should not be too complicated to add one. Contributions are welcome.
A converter should be added to load and register weights from a T2T checkpoint. You can explore how existing converters are defined: https://github.com/OpenNMT/CTranslate2/tree/master/python/ctranslate2/converters
CTranslate2 now provides Python wheels for Windows. https://github.com/OpenNMT/CTranslate2/releases/tag/v2.8.0