swift
swift copied to clipboard
traindataset异常提示
PR type
- [ ] Bug Fix
- [x] New Feature
- [ ] Document Updates
- [ ]More Models or Datasets Support
PR information
dataset_info = {}
logger.info(f'Using num_proc: {args.preprocess_num_proc}')
train_dataset = dataset_map(train_dataset, template.encode, args.preprocess_num_proc)
if val_dataset is not None:
val_dataset = dataset_map(val_dataset, template.encode, args.preprocess_num_proc)
if args.test_oom_error:
train_dataset = sort_by_max_length(train_dataset, 20000)
# Data analysis
td0, tkwargs0 = train_dataset.data[0]
print_example(td0, tokenizer, tkwargs0)
dataset_info['train_dataset'] = stat_dataset(train_dataset)
if val_dataset is not None:
dataset_info['val_dataset'] = stat_dataset(val_dataset)
上述代码中,td0, tkwargs0 = train_dataset.data[0]有可能导致异常如下: AttributeError: 'NoneType' object has no attribute 'data'
添加一个try exception以及raise逻辑,可能报错的原因是:
(1) train_dataset 包含input 或者label为空,请检查数据 (2)max_length过短,input中的样本长度可能超过max_length,请增加max_length长度
Experiment results
ERROR:root:Error accessing dataset properties. Please ensure that the dataset is properly initialized and not empty.
Traceback (most recent call last):
File "demo.py", line 4, in <module>
td0, tkwargs0 = train_dataset.data[0]
^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'data'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "demo.py", line 8, in <module>
raise AttributeError(
AttributeError: Failed to access dataset attributes. This might be because:
(1) The dataset contains None for input or labels;
(2) The 'max_length' setting is too short causing data truncation.