Would you please kindly offer the data, codes, or settings for training the predictor?

Open Raincleared-Song opened this issue 1 year ago • 0 comments

Prerequisites

Before submitting your question, please ensure the following:

[x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
[x] I have carefully read and followed the instructions in the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

I'm trying to train the sparsity predictor by referring to DejaVu, but I have a strange finding. I generate the predictor training data on C4 by myself. For ReLULLaMA-7B, I train a predictor with higher recalls on C4 than that you provide in ReLULLaMA-7B-Predictor. (e.g., 0.94 v.s. 0.90 in layer 0)

However, when applying this predictor to PowerInfer, the efficiency is considerably lower than your ReLULLaMA-7B-Predictor. What is wrong with this? (The upper image is obtained by the predictor trained by myself. The lower image is obtained by ReLULLaMA-7B-Predictor.)

It is also probably due to some mistakes I made. Therefore, I also attach the code for generating the training data (get_llama_data.py and hf_llama_module.py), and for training the predictor (main_mlp.py, run_c4_mlp.sh, trainer_mlp.py).

Looking forward to your response! Of course, the best solution is to open-source the data, codes, or just parameter settings for training ReLULLaMA-7B-Predictor.

Jan 18 '24 04:01 Raincleared-Song