xiaohoua issues

Results 6 issues of


                                            xiaohoua

A question about itk filter multithread.

When I use a filter, how can I know where this filter class turn on multithread. such as itkSimilarityIndexImageFilter, does it turn on the multithread when `using FilterType = itk::SimilarityIndexImageFilter;...

Dev chat_glm npu

需要oneflow-npu合并[logical_not](https://github.com/Oneflow-Inc/oneflow-npu/pull/240)和[rms_norm](https://github.com/Oneflow-Inc/oneflow-npu/pull/239)的改动。推理输出： ![image](https://github.com/user-attachments/assets/5160625e-c9ac-4916-a18b-ad3fe15bb494)

How to run your model on CLIP_benchmark?

How to run your model on CLIP_benchmark? `clip_benchmark eval --dataset=tfds/cifar10 --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64` this run succes on my machine， but `clip_benchmark eval --dataset=cifar10 --task=zeroshot_classification --pretrained=./test_model/ViT-L-14_laion400m_kd_ViT-B-16_cc3m_12m_ep32.pt --model=ViT-L-14 --output=result.json --batch_size=64`...

error when use try to use int8 operations with OpenCLIP.

I am trying to quantify open_clip's pre-trained model and then do a zero sample classification test on clip_benchmark. But get an error: AttributeError: module 'triton.language' has no attribute 'libdevice' Here's...

[support] 2bit dequantize on xpu is slow

use ``` autoround = AutoRound( model, tokenizer, dataset=calib_data, bits=2, group_size=128, sym=False, batch_size=batch_size, seqlen=seqlen, n_samples=len(calib_data), iters=200, ) autoround.quantize() ``` to quant model. here is my infer.py: infer.py ``` import torch import...

Intel GPU

kernel

[question] Do we have a method for 2-bit quantization?

### Describe the issue like title: Do we have a method for 2-bit quantization on intel iGPU? Or Can our hardware machine support 2-bit quantization? Are there any other open-source...