Teknium
Teknium
Are you using mixtral 8x7b or mistral 7b? > Hello, > > After a while of chatting with the nous-hermes2, around 4-10 questions, it stops generating output randomly mid-sentence. I...
> Weird, can you run with `-v` and post the output ``` pip install flash-attn -v Using pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10) Defaulting to user installation because normal site-packages...
> We should have prebuilt wheels for this setting (torch 2.0 cuda 11.8) that setup.py automatically downloads, and nvcc should not be necessary. Are you installing from source or from...
This worked. Should I close the issue or wait until resolved properly?
> I think @tridao fixed this in [0c04943](https://github.com/Dao-AILab/flash-attention/commit/0c04943fa226ee13762039a86ee4360536c09c5b). Can you try `pip install -U flash-attn` now? ```pip install -U flash-attn Collecting flash-attn Downloading flash_attn-2.1.2.post3.tar.gz (2.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 20.2...
> > > I think @tridao fixed this in [0c04943](https://github.com/Dao-AILab/flash-attention/commit/0c04943fa226ee13762039a86ee4360536c09c5b). Can you try `pip install -U flash-attn` now? > > > > > > ``` > > Collecting flash-attn >...
Sorry, I found the dataset on your huggingface. I looked over it though, and the dataset format might be concerning. I may be ignorant, but if trained on the ShareGPT...
It would be wise, imo, to alter the vicuna pipeline being used to simply throw away the sequences that get split off, or perhaps, if needed, throw out all convos...
I also think since a lot of datasets are doing this that it likely has something to do with the vicuna "random stopping" issues
> At present, there are some works to clean the ShareGPT dataset, and we will continue to pay attention to it. Can you link any of those?