neural-compressor
neural-compressor copied to clipboard
How to parallelize a model with fake quantization nodes?
If a quantized model contains fake quantization nodes, how can such a model be parallelized, and how can its accuracy be validated on a dataset?
Hi @sheegao , which framework do you use? Does 'fake quantization nodes' mean quant-dequant pairs?
Hi @sheegao , which framework do you use? Does 'fake quantization nodes' mean quant-dequant pairs?
Yes,it's mean quant-dequant pairs,but I have not found any existing distributed frameworks that support this type of model.
Hi @sheegao , you can directly deploy a fake quantized torch model with torch DistributedDataParallel API. I don't think this will be a problem.
Hi @sheegao , you can directly deploy a fake quantized torch model with torch DistributedDataParallel API. I don't think this will be a problem.
yeah,But I'm interested in pipeline parallelism or tensor parallelism, rather than data parallelism
I think this question should be raised to packages who provide pipeline parallelism or tensor parallelism.
No further plan or discussion in this thread, close it for now