swift icon indicating copy to clipboard operation
swift copied to clipboard

traindataset异常提示

Open yanqiangmiffy opened this issue 1 month ago • 0 comments

PR type

  • [ ] Bug Fix
  • [x] New Feature
  • [ ] Document Updates
  • [ ]More Models or Datasets Support

PR information

dataset_info = {}
logger.info(f'Using num_proc: {args.preprocess_num_proc}')
train_dataset = dataset_map(train_dataset, template.encode, args.preprocess_num_proc)
if val_dataset is not None:
    val_dataset = dataset_map(val_dataset, template.encode, args.preprocess_num_proc)
if args.test_oom_error:
    train_dataset = sort_by_max_length(train_dataset, 20000)
# Data analysis
td0, tkwargs0 = train_dataset.data[0]
print_example(td0, tokenizer, tkwargs0)
dataset_info['train_dataset'] = stat_dataset(train_dataset)
if val_dataset is not None:
    dataset_info['val_dataset'] = stat_dataset(val_dataset)

上述代码中,td0, tkwargs0 = train_dataset.data[0]有可能导致异常如下: AttributeError: 'NoneType' object has no attribute 'data'

添加一个try exception以及raise逻辑,可能报错的原因是:

(1) train_dataset 包含input 或者label为空,请检查数据 (2)max_length过短,input中的样本长度可能超过max_length,请增加max_length长度

Experiment results

ERROR:root:Error accessing dataset properties. Please ensure that the dataset is properly initialized and not empty.
Traceback (most recent call last):
  File "demo.py", line 4, in <module>
    td0, tkwargs0 = train_dataset.data[0]
                    ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "demo.py", line 8, in <module>
    raise AttributeError(
AttributeError: Failed to access dataset attributes. This might be because:
(1) The dataset contains None for input or labels;
(2) The 'max_length' setting is too short causing data truncation.

yanqiangmiffy avatar May 01 '24 04:05 yanqiangmiffy