manifest
manifest copied to clipboard
How to load model with half-precision, such as float16 since only have limited gpu memory
Description of the bug
can not load model with half precision. And haven't figured out how to transfer model to CPU or GPU?
To Reproduce
run model gpt-j-6B as in the demo use local huggingface method
Expected behavior
return a repsonse.
Error Logs/Screenshots
requests.exceptions.HTTPError: {'message': '"LayerNormKernelImpl" not implemented for 'Half''}
Environment (please complete the following information)
- OS: [e.g. Ubuntu 20.04]
Thanks in advance.