baize-chatbot
baize-chatbot copied to clipboard
try train 25G data/quora_chat_data failed
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/py38/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-e59c3670f1657ac9/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...
Downloading data files: 100%|██████████| 1/1 [00:00<00:00, 2349.75it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 483.88it/s]
Traceback (most recent call last):
File "/opt/conda/envs/py38/lib/python3.8/site-packages/datasets/builder.py", line 1860, in _prepare_split_single
for _, table in generator:
File "/opt/conda/envs/py38/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 113, in _generate_tables
io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
File "pyarrow/_json.pyx", line 55, in pyarrow._json.ReadOptions.init
File "pyarrow/_json.pyx", line 80, in pyarrow._json.ReadOptions.block_size.set
OverflowError: value too large to convert to int32_t
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "finetune.py", line 51, in
这种由于训练数据太大而出现的问题,要怎么解决呢?
Not completely sure but this may be helpful: https://stackoverflow.com/questions/68652157/how-do-i-debug-overflowerror-value-too-large-to-convert-to-int32-t