dalai
dalai copied to clipboard
Slow output due to regeneration of input prompt
To give an example, if I input a chunk of paragraphs as input prompt with the instruction to summarize the text, the ouput response appears to be regenerating the input prompt from the start rather than straightaway continuing it (see reference video). This seems to be an unusual behavior, when compared to my experience with using other language model UIs. (e.g. oogabooga)
Can this be improved?
https://user-images.githubusercontent.com/62466671/228490563-b6e92c44-d0a2-4f89-adc5-adb1e060a8c2.mp4
I also noticed this. Perhaps that is how these models work but this part of e.g. chatGPT is doing this way faster due to the sheer scale of the servers used?