Siyuan Feng

Results 99 comments of Siyuan Feng

> @Hzfengsy are you sure? I see even python 3.14 has from __future__ import annotations You are right, `from __future__ import annotations` will exist. But my opinion is that we...

mlc_llm.build is a deprecated interface, please refer latest docs: https://llm.mlc.ai/docs/compilation/convert_weights.html https://llm.mlc.ai/docs/compilation/compile_models.html

> Error: Using LLVM 19.1.1 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic. I wonder why you are using apple and macos as mcpu and mtriple. What's your...

I disagree with the premise that (2, 3) + (2, n) can be statically inferred to have the shape (2, 3). The validity of this operation depends on the runtime...

MLC leverages Adreno GPU instead of Qualcomm CPU or NPU to run the model. Here are my suggestions: 1. Try `q4f16_0` instead of `q4f16_1`, as `_0` will have better prefill...

a few hundred MB must be the CPU memory usage. However, the model is stored in the GPU memory, so OS-level memory command is not enough

It's not an accurate value. Just for the simple check to see if the model can run on device

Could you please try if you can run the original Qwen2-0.5B? Also, can you run your fine-tuned model on other devices, i.e. CUDA?