FNsi comments

Results 68 comments of


                                            FNsi

Q4_1 acceleration

Thanks for doing that! I successfully quantised 30B to 4_1 size, here's my test. > apx run ./main -m models/30B/ggml-model-q4_1.bin -n 4096 --temp 0.7 --top_p 0.5 --repeat_penalty 1.17647 -c 4096...

Q4_1 acceleration

Sorry I just find out the problem I made. I already changed the line to 4_1 bin, and resubmit the current response. > Thank you! > > > > @FNsi,...

Using non LoRA Alpaca model

Sorry but do some one know how to merge the lora to the raw model?

In interactive/chat mode, sometimes User: does not appear and I need to manually type in my nickname

I find out sometimes ai think they chat in a forum, so the user can be kick out the chat😂

[fixed]The last code build with memory fix running result is not good in my pc.

Seems the flag n makes different. n=100000 no long work🤔. But After change n to 4096 also not working...

[fixed]The last code build with memory fix running result is not good in my pc.

> Seems the flag n makes different. > n=100000 no long work🤔. > After change n to 4096 also not working. Rollback to my back up...

[fixed]The last code build with memory fix running result is not good in my pc.

> some of the last commits changed/touched how memory is handled. > > also there is `-c` you can set up to 2048. Always. -c works fine even with 5000+...

[fixed]The last code build with memory fix running result is not good in my pc.

> Kinda similar case here, although im unsure which specific commit caused performance loss > > > > After swapping out old exe for new one, I went from 207...

[fixed]The last code build with memory fix running result is not good in my pc.

> @FNsi please try again with latest master. Ban the Blas make it works, still have little performance loss anyway. a guess from me, is it because the blas try...

[fixed]The last code build with memory fix running result is not good in my pc.

> It's not the 4-bits - it does not work with F16 either. > > I am almost sure that this F16 BLAS call is somehow wrong: > > >...