David MacLeod
Results
6
comments of
David MacLeod
Agree that this would be very useful
> Can you share a code snippet you used for loading GPT? Also, currently, DS-inference uses fp16 special CUDA kernels for inference which is not the case for int8. int8...
@yaozhewei any news on this?
Thanks @yaozhewei! Do you know whether there is a rough timeline for this? e.g. 1 month, 6 months, 1 year? It would be very useful to know as we'd like...
Is there any developments here? If I was to contribute this change would it be considered? Would an environment variable or a CLI arg be more appropriate here for disabling...
Any updates on this? Thanks.