web-stable-diffusion
web-stable-diffusion copied to clipboard
Can I auto-tunning SD models by myself?
with tvm.transform.PassContext(opt_level=3): ex = relax.build(mod_deploy, args.target)
args.target: “cuda”
pip install -I mlc_ai_nightly_cu121 -f https://mlc.ai/wheels
but,get errors as below!
Traceback (most recent call last):
File "web-stable-diffusion/build.py", line 184, in B
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable A
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable matmul
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable matmul
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable matmul
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/workspace/tvm/src/tir/analysis/verify_memory.cc", line 205
RuntimeError: Memory verification failed with the following errors:
from tvm.script import tir as T
@T.prim_func def matmul20(A: T.Buffer((T.int64(2), T.int64(256), T.int64(1280)), "float32"), B: T.Buffer((T.int64(1280), T.int64(1280)), "float32"), matmul: T.Buffer((T.int64(2), T.int64(256), T.int64(1280)), "float32")): T.func_attr({"global_symbol": "matmul20", "op_pattern": 4, "target": T.target({"arch": "sm_86", "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["cuda", "gpu"], "kind": "cuda", "max_num_threads": 1024, "tag": "", "thread_warp_size": 32}), "tir.noalias": T.bool(True)}) for i0, i1, i2, k in T.grid(2, 256, 1280, 1280): cse_var_2: T.int32 = i0 * 327680 + i1 * 1280 cse_var_1: T.int32 = cse_var_2 + i2 matmul_1 = T.Buffer((T.int64(655360),), data=matmul.data) if k == 0: matmul_1[cse_var_1] = T.float32(0) A_1 = T.Buffer((T.int64(655360),), data=A.data) B_1 = T.Buffer((T.int64(1638400),), data=B.data) matmul_1[cse_var_1] = matmul_1[cse_var_1] + A_1[cse_var_2 + k] * B_1[k * 1280 + i2]
I wonder whether we could get tunning script from your team to tunning sd models by myself.
such as links below: https://github.com/mlc-ai/mlc-llm/commit/8aeb3dfe9ff07b04331cc0ed6fdc7c3ee384e382#diff-643d01e2455cf9344c3c81c40c42c8d6aad9cd7ad207aa72712c0b1556c2d014 mlc_llm/tuning.py
@felixslu any solution to this problem? I get the same error.