ChunkLlama icon indicating copy to clipboard operation
ChunkLlama copied to clipboard

Does it supprts batch inference?

Open SkyAndCloud opened this issue 1 year ago • 1 comments

Hi guys, thank you for this excellent work! It seems that this code does not consider the attention_mask during chunkllama inference. Does this code support batch inference?

SkyAndCloud avatar May 23 '24 09:05 SkyAndCloud

Yes! We use flash_attn_func for better efficiency and simplicity. Changing to flash_attn_varlen_func should not be difficult. If you encounter any difficulties, please feel free to leave a comment.

ChenxinAn-fdu avatar May 23 '24 13:05 ChenxinAn-fdu