Nikhil Gupta
Nikhil Gupta
Hello, I am trying to build bazel/example:main with ndk r25 tool chain for android api level 31 . I am using bazel version 6.0.0 . When I try to compile...
Hello @alankelly @wei-v-wang , How can we fix this issue if we are sticking to ubuntu 16 & gcc 5.4.0 ? I have tried #define _POSIX_C_SOURCE 199309L as suggested by...
Hello, I am sorry if question is very basic but need a little help over here. Cant we just skip the attention processing and continue from here for input prompt...
Hello, I am trying to run LLM on s24 Ultra device with 12GB of ram. My LLM has a large embedding size of 160984 * 2048 . The fp32 file...
Hello @wangzhaode , I would like to report an issue which I recently discovered. I can see if a sentence piece model is used for tokenizer ( like llama2) ,...
Hello, I am trying to add support for models with GQA eg. Tiny llama The indicator for grouped query attention is when num_key_value_heads < num_attention_heads in config.json file For TinyLlama...