Junbum Lee
Junbum Lee
현재 데모 파라미터는 적당히 잘 나오는 파라미터를 임의로 정한 부분인데요, - temperature 0.5 - top_p 0.95 이 두가지 옵션만 주고 있습니다. (Chatbot의 경우 temperature를 낮게 주는게 조금 더 도움이 됩니다.)
https://github.com/Beomi/KoAlpaca/tree/main/webui 공개로 Close 합니다.
헤더 만들것
하드코딩 -> JSON envs.json으로 처리
this repo is under dev, so plz wait :)
But you could try this repo now reading the README guide! not yet fully implemented the paper, but worth to try 👍
Did you trained the model? just loading with gate will output just random tokens.
oh I think LoRA is not compatible with this: 'cause model have to get a chance to learn 'how to use long term memory' but if you initiate with LoRA,...
Oh your loss seems pretty high. If I were you, I could wait until loss ~4. LM train loss >4 is typically closer to the random than fluent generation.
Since infini attention uses segmentation which is mainly focused on reducing memory usage and computational cost into O(N) so if you use very long seq such as seq len =...