mlc-llm
mlc-llm copied to clipboard
Could you provide some tips about adding support for BloomZ?
I've run Vicuna7B successfully on Android device.
I'm trying running https://huggingface.co/bigscience/bloomz model on my device. Could you provide some tips about adding support for BloomZ? Are there any videos about mlc ? I've never used TVM before. Just see it in half an hour. It seems it could optimize and make it easier to do model tensor execution. E.g. For llama, it created four functions like this: create_encoding_func(bb, config) create_decoding_func(bb, config) create_kv_cache_func(bb, config) create_softmax_func(bb, config) Why we need kv_cache and softmax here? There are some very basic questions and need some more time for me to study it. Hope get some guidelines of adding a new model based on transformers such as BloomZ
Thanks a lot!