EAGLE icon indicating copy to clipboard operation
EAGLE copied to clipboard

Data processing script (ge_data/allocation.py) script does not work out of the box

Open avnermay opened this issue 1 year ago • 1 comments

There are a few small issues:

  1. The model is set to load from local file instead of from huggingface hub (https://github.com/SafeAILab/EAGLE/blob/main/ge_data/ge_data_all_vicuna.py#L22). To fix this, I just set bigname='lmsys/vicuna-13b-v1.3' at the line in ge_data_all_vicuna.py.

  2. The ShareGPT dataset loads from local disk, without instructions for how to download it. To fix this, I downloaded the ShareGPT dataset from https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V4.3_unfiltered_cleaned_split.json wget https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V4.3_unfiltered_cleaned_split.json .

avnermay avatar Feb 22 '24 02:02 avnermay

Thanks for asking about this and posting this information. I ran into the same problem and was trying to find the dataset as I try to create compatibility with a Gemma model.

M-Chimiste avatar Mar 04 '24 01:03 M-Chimiste

@Liyuhui-12 @hongyanz, could you please confirm whether the dataset used to train EAGLE is available at https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V4.3_unfiltered_cleaned_split.json, or could you upload your version of the dataset?

garipovroma avatar Oct 15 '24 15:10 garipovroma

@Liyuhui-12 @hongyanz, could you please confirm whether the dataset used to train EAGLE is available at https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V4.3_unfiltered_cleaned_split.json, or could you upload your version of the dataset? https://github.com/SafeAILab/EAGLE/issues/51#issuecomment-1986805786 It seems that they used this dataset. I tried wget this json file but many modifications are still needed in their training scripts.

HaochenZhao avatar Oct 23 '24 13:10 HaochenZhao