Mark Kurtz
Mark Kurtz
Hi @jinfagang, the optimized inference code is kept closed source, so unfortunately not available. For the quantization op, we do support the ONNX standard specs for QLinearConv, QLinearMatMul, and QuantizeLinear,...
Hi @rafafael03, we don't currently have that public for the list of supported layer types, but we will update docs for this soon! For now, you can run the DeepSparse...
Hi @sriyachakravarthy, I'd like to clarify a bit more about this. Our LLM Compressor flows are currently for vLLM / our compression pathways for GPUs and specifically for Transformers models....