Hyeongseok Oh
Hyeongseok Oh
This commit revisits IOTensor - OperandInfo such as shape, type, dynamic is maintained on its own "_info" field - Remove orig_info() and orig_layout() getter - setTensor methods update its own...
- Revisit IOTensor and Executor interface - Revisit IPortableTensor - APIs for input/output - Run with float input/output buffer
- Revise Tensor struct: IPortableTensor, IOTensor, UserTensor - Revise Executor I/O setting - Merge execute method: use option, input & output tensor - Revise MultiModelExecutor for type-aware - Remove IOTensor...
### What? We can think `hidden switching mechanism`: allocate backend automatically for user requirement such as best performance or best memory usage. Then runtime need assumption mechanism to allocate backend...
### What? If user load quantized model, user can set float type input buffer and data, output buffer. Then runtime can quantize input to read input data and dequantize output...
- Block quantization for LLM: FullyConnected, Gather - Decide quantize type by circle-quantizer parameter: `--block_quantize_weights` (Q4_0, Q8_0) - Skip quantization by circle-quantizer parameter: `--skipsize_block_quantize` (default: 0) --- Caution: It's for...
By #14009, some fully connected tests are skipped. Maybe there is issue to support hybrid quantization on fully connected kernel.
- Introduce and change API to set layout and data type - Add compile pass to insert permute OP for input and output layout and data type conversion - Add...
- Change field to support multi model - Add I/O layout field to move supporting layout&datatype change on compile phase ONE-DCO-1.0-Signed-off-by: Hyeongseok Oh --- Related issue: #13645 #13646
### What Let's support multi model package `CompileOptions` for each model such as scheduler setting and backend mapping. ### Why Currently, `CompileOptions` setting is not fully supported on `MultiModelCompiler`, and...