zyf-gh
zyf-gh
I found that in `modeling_phonelm_npu`, `shadowLayers` is `{0,1,3,4}`, while in `modeling_qwen_npu`,` shadowLayers` is `{1,2,26}`. How are `shadowlayers` known?
I found that the quantization algorithm has fixed supported models. If I want to perform int8 quantization on my own custom model, how can I do it?
When I execute `outputs.printData()` or `outputs.saveData()` on ouputs in the `Forward function` in the `src/models/phonelm/modeling_phonelm.hpp/PhoneLMForCausalLM` , a `Segmentation fault` occurs. How can I get the data of outputs ?
I would like to ask if the kv cache generated in the prefilling stage can be used as pytorch's kvcache to allow pytorch to perform subsequent decoding work on another...