Junru Shao
Junru Shao
Would you like to remove `dist/prebuilt` and try again?
My personal experience is that using 64bit relocation is fine on x86-64, so I am in favor of such change :-)
We are using GPUs on Android. CPUs, as indicated in this thread, are likely too slow to support an LLM meaningfully.
CC: @tqchen @CharlieFRuan @davidpissarra
There are a couple of failed cases throughput my experiments: ```json { "destination": "{username}/{model_id}-{quantization}-MLC", "default_quantization": ["q3f16_1", "q4f16_1", "q4f32_1"], "tasks": [ {"model_id": "llama2_7b_chat_uncensored", "model": "https://huggingface.co/georgesung/llama2_7b_chat_uncensored", "context_window_size": 4096, "conv_template": "llama-default"}, {"model_id": "open_llama_3b",...
@LeshengJin has been working closely with me on this direction, and he found that: > I tested all models uploaded. Most of the models worked well, but the following models...
Thanks for getting back to me so quickly @CharlieFRuan! > For WizardLM-7B-V1.0 and WizardLM-30B-V1.0, their weights on HF https://github.com/mlc-ai/mlc-llm/pull/489; and I think they are somewhat obsolete already (they are pre-llama2)....
> I would love to help push on this front, but I am afraid that I do not have much bandwidth before mid-January...I am prioritizing some effort on the web-llm...
@Hzfengsy Yeah definitely! Would you like to enhance these lines: https://github.com/apache/tvm/blob/main/python/tvm/target/detect_target.py#L77
如果感兴趣的话欢迎贡献呀