FasterTransformer
FasterTransformer copied to clipboard
gptneox & gptj int8 quantization & share context
support gptneox int8 & share context gptj int8 - will throw exception when running in int8 mode, will need to be fixed