sys_reading
sys_reading copied to clipboard
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices