alpaca-qlora icon indicating copy to clipboard operation
alpaca-qlora copied to clipboard

ValueError: test_size=2000 should be either positive and smaller than the number of samples 2 or a float in the (0, 1) range

Open quantumalchemy opened this issue 1 year ago • 1 comments

Traceback (most recent call last): File "/home/studio-lab-user/sagemaker-studiolab-notebooks/alpaca-qlora/finetune.py", line 419, in fire.Fire(train) File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/studio-lab-user/sagemaker-studiolab-notebooks/alpaca-qlora/finetune.py", line 347, in train train_val = data["train"].train_test_split( File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 545, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/datasets/fingerprint.py", line 511, in wrapper out = func(dataset, *args, **kwargs) File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 4379, in train_test_split raise ValueError( ValueError: test_size=2000 should be either positive and smaller than the number of samples 2 or a float in the (0, 1) range

Maybe something to do with dataset .. using standard alpaca format using >> python finetune.py
--base_model 'openlm-research/open_llama_3b_600bt_preview'
--data_path '../datasets/dolly.json'
--num_epochs=3
--cutoff_len=512
--group_by_length
--output_dir='./dolly-lora-3b'
--lora_r=16
--lora_target_modules='[q_proj,v_proj]'

Where is the ../datasets/dolly.json file .. I would like to see this data how it look.. Any advise/ Thanks

quantumalchemy avatar Jun 27 '23 15:06 quantumalchemy