Results 131 comments of setzer22

Same here :+1: What I would do is always enable this by default, and just have a flag to disable it.

As mentioned on the discord conversation, the real challenge here is extending the context window beyond the current cap of 2048 tokens. But in the meantime, a chat application with...

Yup, I don't see any problems here (other than this just hasn't been implemented yet) :smile: This might require some careful handling of the underlying ggml context. Make sure a...

I'd say this is in-scope for the project, but I don't have enough time to tackle this unfortunately :sweat_smile: PRs welcome for anyone who wants to take on the task!

I'd say being able to infer beyond eot is a feature some might want, even if it's just to run some experiment to see what would happen. But I'm OK...

We already do! The alpaca-lora weights (converted to ggml fomat) are compatible with the implementation im this repo. If you go to our discord server (see link in README) we...

> Namely the model context does not seem to be reset between requests The best way to handle this is to create one new `InferenceSession` per request. An inferenc session...

Alright, I did a first attempt, but couldn't manage to get it working. Here's what I tried: 1. Pulled the https://github.com/huggingface/transformers/ repository. 2. Installed torch using `pip install torch` 3....

Apart from my initial exploration, I also realized the `tokenizers` crate brings in [a ton of dependencies](https://github.com/huggingface/tokenizers/blob/main/tokenizers/Cargo.toml), plus requires installed OpenSSL libraries to build. I don't think all this is...

Hi @Narsil! Thanks a lot :) We are evaluating what's the best route to integrate this, I have a few questions if you don't mind: - We are considering a...