felixslu issues

Results 10 issues of


                                            felixslu

[Bug] make[2]: *** [tokenizers/CMakeFiles/tokenizers_c.dir/build.make:71: tokenizers/release/libtokenizers_c.a] Error 1

## 🐛 Bug ## To Reproduce 1. python3 build.py --hf-path databricks/dolly-v2-3b --quantization q3f16_0 **(It is OK!)** 2. Build the CLI : cd build cmake .. make ## Expected behavior [...

bug

[Bug] mlc-llm/cpp/conv_templates.cc:156: Unknown conversation template: dolly

## 🐛 Bug ## To Reproduce 1、 compile model **（It is OK！）** python3 build.py --hf-path databricks/dolly-v2-3b --quantization q3f16_0 2、compile mlc_chat_cli **（It is OK！）** cd build cmake .. make 3、run mlc_chat_cli...

bug

[Feature Request] Is there any plan for stable diffusion running under this project?

## 🚀 Feature Support stable diffusion models! SnapFusion models can run in the phone within 2 seconds! [https://arxiv.org/pdf/2306.00980.pdf](url) ## Motivation Text-to-image diffusion models can create stunning images from natural language...

feature request

How could I get scale and offset with scalar-type form?

the scale and offset of UniformAffineQuantizer are tensor-type data after I finished quantization. How to convert them to scalar data ,used to generate quantization form. @yhhhli such as: Encoding:{ bitwidth:...

Can not reach the Throughput value which described in your performance doc under fp16 llama7B

# Background: in the performance doc [https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/performance.md](url) mentioned: LLama7B , FP16 , batchsize:256 , input_len:128 output_len:128 ,A100 , reach a Throughput value of 5,353 tok/s/GPU。 ![Uploading image.png…]() # Problem: on...

triaged

Triton backend

Huge performance gap between TVM and TRT on Stable Diffusion v1.5

GPU: Nvidia RTX 3090TI. 1. Firstly, I use the log db in the repo, it gives me 3.7s to get the result. 2. Then, I tried to tuning myself using...

How to use fp16 precison version of Stable Diffusion 1.5?

when I use fp16 version ,I got an capture error on my 3090TI. pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", revision='fp16', torch_dtype=torch.float16, local_files_only=True ) RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Can I auto-tunning SD models by myself?

`with tvm.transform.PassContext(opt_level=3): ex = relax.build(mod_deploy, args.target)` args.target： “cuda” pip install -I mlc_ai_nightly_cu121 -f https://mlc.ai/wheels but,get errors as below! Traceback (most recent call last): File "web-stable-diffusion/build.py", line 184, in build(mod, ARGS)...

Is there a plan to support quantization for SD within TVM Unity？

H，MLC Team！How to quantify SD models with TVM Unity for 3-4bit、int8、fp16 precision？

Why this quantization model need more than 24GB GPU memory which is larger than ideal 500M?

### 1、Questions As we Known, SD v1.5 has 1 Billions params , and it's peek GPU memory is about 4G at the precison fp32. So, the memory of int4 precison...