Jiaping(JP) Zhang
Jiaping(JP) Zhang
@baskaryan @leo-gan Okay I have addressed the feedback and added a unit test. Hopeful we can get this merged soon since it could be low-hanging fruit but useful feature.
> @baskaryan @leo-gan Okay I have addressed the feedback and added a unit test. Hopeful we can get this merged soon since it could be low-hanging fruit but useful feature....
Can you please review again? Thanks! @baskaryan @leo-gan
When is the PR going to merge? It will be nice to have a local version for that.
I used the `mp7-7b-chat` model and specified the `n_ctx=4096` but still got the error - ```python llm = GPT4All(model='../models/ggml-mpt-7b-chat.bin', verbose=False, temp=0, top_p=0.95, top_k=40, repeat_penalty=1.1, n_ctx=4096, callback_manager=stream_manager) ``` Error log: ```python...
Ya it seems the argument `n_ctx` is not being used or passed to the downstream properly so that the mpt-7b model can't be leveraged with longer context length window. Hope...
But with the flash attention approach implemented in the MPT model, as the paper claims, shouldn't it possible to process the long sequence(during the inference) greater than the context size...
Again I have to implement workarounds to launch the docker - I created two PRs. But after that, it seems all the dockers are launched and the indexing step ran...
Is this ready to get merged? Interested in testing it out.
Nice UI update. It would be good to fix the conflicts and get it merged.