Siyuan Feng comments

Results 99 comments of


                                            Siyuan Feng

[BUG][Runtime] `from future import annotations` breaks tvm's type annotation

> @Hzfengsy are you sure? I see even python 3.14 has from __future__ import annotations You are right, `from __future__ import annotations` will exist. But my opinion is that we...

[Relax] Expose BlockBuilder's Analyzer instance in Python

cc @tqchen

[Trying to compile my fine-tuned llama3 llm using MLC-LLM but keep running to this]

mlc_llm.build is a deprecated interface, please refer latest docs: https://llm.mlc.ai/docs/compilation/convert_weights.html https://llm.mlc.ai/docs/compilation/compile_models.html

Error with LLVM Configuration on Windows for GPU Inference in mlc-llm

> Error: Using LLVM 19.1.1 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic. I wonder why you are using apple and macos as mcpu and mtriple. What's your...

[Relax] Fix a bug that occurred due to shape inference not handling static dim vs symbolic dim

I disagree with the premise that (2, 3) + (2, n) can be statically inferred to have the shape (2, 3). The validity of this operation depends on the runtime...

[Question] While running the mlc-llm app on Android, the prefill token is very slow sometimes.

MLC leverages Adreno GPU instead of Qualcomm CPU or NPU to run the model. Here are my suggestions: 1. Try `q4f16_0` instead of `q4f16_1`, as `_0` will have better prefill...

[Bug] how to accurately measure the real memory usage on Android ？

a few hundred MB must be the CPU memory usage. However, the model is stored in the GPU memory, so OS-level memory command is not enough

[Question] How to estimation of the vRAM the model takes at runtime?

It's not an accurate value. Just for the simple check to see if the model can run on device

[Bug] fine-tuned model deployed with webllm not working

Could you please try if you can run the original Qwen2-0.5B? Also, can you run your fine-tuned model on other devices, i.e. CUDA?