qlora
qlora copied to clipboard
Core dump. Not sure if caused by earlier problem with pad_token
Found cached dataset parquet (/home/developer/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-2b32f0433506ef5f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 303.23it/s] Loading cached processed dataset at /home/developer/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-2b32f0433506ef5f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-7688172b91a1f6d4.arrow Loading cached processed dataset at /home/developer/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-2b32f0433506ef5f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-0b873bde6329608e.arrow torch.float32 422326272 0.11537932153507864 torch.uint8 3238002688 0.8846206784649213 0%| | 0/10000 [00:00<?, ?it/s]Error invalid device ordinal at line 359 in file /home/tim/git/bitsandbytes/csrc/pythonInterface.c /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit Segmentation fault (core dumped) (Guanaco) developer@ai:~/qlora$
Now I noticed it may be same as this issue #3
def prefetch_tensor(A, to_cpu=False): assert A.is_paged, 'Only paged tensors can be prefetched!' if to_cpu: deviceid = -1 else: deviceid = A.page_deviceid
num_bytes = dtype2bytes[A.dtype]*A.numel() lib.cprefetch(get_ptr(A), ct.c_size_t(num_bytes), ct.c_int32(deviceid))
The above function is part of bitsandbytes
Need to check if deviceid is -1 I took a brief look at python funcs and it seems that to_cpu=False in all the places I looked.
Also, is this cast ct.c_int32 correct? Should it be unsigned, or maybe "short"?