Timon Käch comments

Results 72 comments of


                                            Timon Käch

trafficstars

[Feature Request] TensorRT support

Yeah AiTemplate seems good too.

Is there a possibilty to offload the model to ram?

Thank you very much! Did I read correctly that I have to wait 20 minute for inference?

Is there a possibilty to offload the model to ram?

Splitting half on gpu and other half on ram doesn't work? Because GPT J was quite fast. (1-2 tokens / sec)

Is there a possibilty to offload the model to ram?

Closing as text generation webui does all of this and even gpt q 4bit

Performance e-core bug(?) - only 50% CPU utilization when using all threads - (Win11, Intel 13900k)

Hello @anzz1 I also have a i5 13600k and I think I could speed up the generation with this code. Since I'm an beginner, where do I have to put...

Bugfix: Fix broken: UnicodeDecodeError: 'utf-8' codec can't decode

Doesn't this just remove the error? Emoji's still don't work

Why is the latest version 2x slower?

Can we install the older version again until this is fixed? How?

Why is the latest version 2x slower?

Just checked it. LLaMA CPP is 40ms per token for me and the python bindings are 200ms per token so it's much slower. Sadly downgrading to version 0.1.27 is still...

Loading the model on multiple GPUs

I also would like to know how to do this? I have 2x3060 12gb so I could load the 13b model but it doesn't seem to be implemented

78k evolved code instructions

+1 to this