Mark Kurtz comments

Repositories
Issues
Comments

Results 13 comments of


                                            Mark Kurtz

Is Deepsparse specially optimized for certain model architectures regardless of the sparsity?

Hi @jinfagang, the optimized inference code is kept closed source, so unfortunately not available. For the quantization op, we do support the ONNX standard specs for QLinearConv, QLinearMatMul, and QuantizeLinear,...

Is Deepsparse specially optimized for certain model architectures regardless of the sparsity?

Hi @rafafael03, we don't currently have that public for the list of supported layer types, but we will update docs for this soon! For now, you can run the DeepSparse...

Default quantization- True or false in SparseGPT

Hi @sriyachakravarthy, I'd like to clarify a bit more about this. Our LLM Compressor flows are currently for vLLM / our compression pathways for GPUs and specifically for Transformers models....