Flash Li

Results 2 issues of Flash Li

### Description Inference speed is too slow and positively related to max_new_tokens length. For example, I set max_new_tokens=1000, which would take almost 30s~40s with A100 ### Background After loading a...

Hello! I am a student working on a graphical programming chatbot and currently facing challenges in **implementing a chatbot specifically for graphical programming languages(Blockly)**. I would like to ask how...