Subham Kumar comments

Repositories
Issues
Comments

Results 1 comments of


                                            Subham Kumar

AWQ support

What's the current best option if I have to use this 4bit finetuned model using vLLM inference- Is it to convert it to 16bit and then perform the inference?