aihenry comments

Repositories
Issues
Comments

Results 4 comments of


                                            aihenry

How to increase speed of inference speed for CPU?

For your reference, i am running below code on my i5 PC without GPU, fast enough :) modelInUse = "codellama-13b-instruct.ggmlv3.Q4_1.bin" config = { 'max_new_tokens': 1024, 'repetition_penalty': 1.1, 'temperature': 0.1, 'top_k':...

Everything OK? Abandoned?

@TheBloke 🥇 💯 👍 @marella 🥇 💯 👍 Both of you are my Hero! I learned LLM application design and integration with your LLMs and the ctransformers lib!

How to handle the token limitation for a LLM response?

config = {'max_new_tokens': 2048, **'context_length': 8192, #

FileNotFoundError: Could not find module '...ctransformers\lib\cuda\ctransformers.dll' (or one of its dependencies).

@marella Thank you for your hints. After re-install these, it works fine: pip install ctransformers[cuda] pip install nvidia-cublas-cu11 pip install nvidia-cuda-runtime-cu11