hgaspar

Results 9 issues of hgaspar

Different SKUs have different number of compute units. It is highly desirable to fully utilize the compute hardware, by ensuring that each compute unit is occupied during the execution of...

This is a very common ONNX operator. ORT falls back to CPU when encountering this node. The code is at: https://github.com/microsoft/onnxruntime/blob/ee603ee3265dbf6eac112baf273b6b69bf696085/onnxruntime/core/providers/migraphx/migraphx_execution_provider.cc#L1022-L1029 This happens when executing LLama v2, from: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama/LLaMA-2 E2E...

onnxruntime
Onnx Operators
UAI

Enable creating engines (currently, MXR files, eventually perhaps dynamic objects) without embedding the weights in the engine. Use cases: (1) Support compilation for various batch sizes without duplicating the weights....

enhancement

Many CNNs layers use padding (of 1) on their inputs, usually by adding 0s to the halo around the image size. Another way to pad (better in certain situations, but...

Modern software (e..g pytorch) use symbolic shapes, where the word symbol here is in the sense of sympy (i.e. supporting symbolic manipulations, and symbolic propagation. For example: Product of two...

rocblas, miopen etc should not be static dependencies, as they are currently, rather, they should be loaded dynamically, if those backends are enabled (or not-disabled). If a backend is not...

enhancement
Windows
UAI

It should be possible to execute an migraphx compiled model, without needing a full migraphx installation, in particular without needing the machinery that parses onnx files, compiles models, etc... This...

enhancement

It should be possible to enable run-time accuracy debugging, i.e. inspection of values of a tensor, for the purpose of detecting 0s, or NaNs, or any other user-specified condition, for...

enhancement

Such an operator appears in LLM models quantized to int4 (also with GroupQueryAttention nodes), via the genai tool. Only N=4 needs to be supported in near term (i.e. 4 bits)...

UAI