neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

How to parallelize a model with fake quantization nodes?

Open sheegao opened this issue 1 year ago • 5 comments

If a quantized model contains fake quantization nodes, how can such a model be parallelized, and how can its accuracy be validated on a dataset?

sheegao avatar Dec 08 '23 13:12 sheegao

Hi @sheegao , which framework do you use? Does 'fake quantization nodes' mean quant-dequant pairs?

mengniwang95 avatar Dec 13 '23 08:12 mengniwang95

Hi @sheegao , which framework do you use? Does 'fake quantization nodes' mean quant-dequant pairs?

Yes,it's mean quant-dequant pairs,but I have not found any existing distributed frameworks that support this type of model.

sheegao avatar Dec 14 '23 03:12 sheegao

Hi @sheegao , you can directly deploy a fake quantized torch model with torch DistributedDataParallel API. I don't think this will be a problem.

xin3he avatar Dec 14 '23 07:12 xin3he

Hi @sheegao , you can directly deploy a fake quantized torch model with torch DistributedDataParallel API. I don't think this will be a problem.

yeah,But I'm interested in pipeline parallelism or tensor parallelism, rather than data parallelism

sheegao avatar Dec 14 '23 07:12 sheegao

I think this question should be raised to packages who provide pipeline parallelism or tensor parallelism.

xin3he avatar Dec 15 '23 02:12 xin3he

No further plan or discussion in this thread, close it for now

thuang6 avatar Apr 26 '24 06:04 thuang6