Would you please kindly offer the data, codes, or settings for training the predictor?
Prerequisites
Before submitting your question, please ensure the following:
- [x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
- [x] I have carefully read and followed the instructions in the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
Question Details
I'm trying to train the sparsity predictor by referring to DejaVu, but I have a strange finding. I generate the predictor training data on C4 by myself. For ReLULLaMA-7B, I train a predictor with higher recalls on C4 than that you provide in ReLULLaMA-7B-Predictor. (e.g., 0.94 v.s. 0.90 in layer 0)
However, when applying this predictor to PowerInfer, the efficiency is considerably lower than your ReLULLaMA-7B-Predictor. What is wrong with this? (The upper image is obtained by the predictor trained by myself. The lower image is obtained by ReLULLaMA-7B-Predictor.)
It is also probably due to some mistakes I made. Therefore, I also attach the code for generating the training data (get_llama_data.py and hf_llama_module.py), and for training the predictor (main_mlp.py, run_c4_mlp.sh, trainer_mlp.py).
Looking forward to your response! Of course, the best solution is to open-source the data, codes, or just parameter settings for training ReLULLaMA-7B-Predictor.