felixslu

Results 10 issues of felixslu

## 🐛 Bug ## To Reproduce 1. python3 build.py --hf-path databricks/dolly-v2-3b --quantization q3f16_0 **(It is OK!)** 2. Build the CLI : cd build cmake .. make ## Expected behavior [...

bug

## 🐛 Bug ## To Reproduce 1、 compile model **(It is OK!)** python3 build.py --hf-path databricks/dolly-v2-3b --quantization q3f16_0 2、compile mlc_chat_cli **(It is OK!)** cd build cmake .. make 3、run mlc_chat_cli...

bug

## 🚀 Feature Support stable diffusion models! SnapFusion models can run in the phone within 2 seconds! [https://arxiv.org/pdf/2306.00980.pdf](url) ## Motivation Text-to-image diffusion models can create stunning images from natural language...

feature request

the scale and offset of UniformAffineQuantizer are tensor-type data after I finished quantization. How to convert them to scalar data ,used to generate quantization form. @yhhhli such as: Encoding:{ bitwidth:...

# Background: in the performance doc [https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/performance.md](url) mentioned: LLama7B , FP16 , batchsize:256 , input_len:128 output_len:128 ,A100 , reach a Throughput value of 5,353 tok/s/GPU。 ![Uploading image.png…]() # Problem: on...

triaged
Triton backend

GPU: Nvidia RTX 3090TI. 1. Firstly, I use the log db in the repo, it gives me 3.7s to get the result. 2. Then, I tried to tuning myself using...

when I use fp16 version ,I got an capture error on my 3090TI. pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", revision='fp16', torch_dtype=torch.float16, local_files_only=True ) RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

`with tvm.transform.PassContext(opt_level=3): ex = relax.build(mod_deploy, args.target)` args.target: “cuda” pip install -I mlc_ai_nightly_cu121 -f https://mlc.ai/wheels but,get errors as below! Traceback (most recent call last): File "web-stable-diffusion/build.py", line 184, in build(mod, ARGS)...

H,MLC Team!How to quantify SD models with TVM Unity for 3-4bit、int8、fp16 precision?

### 1、Questions As we Known, SD v1.5 has 1 Billions params , and it's peek GPU memory is about 4G at the precison fp32. So, the memory of int4 precison...