datasets icon indicating copy to clipboard operation
datasets copied to clipboard

BuilderConfig ParquetConfig(...) doesn't have a 'use_auth_token' key.

Open tteguayco opened this issue 9 months ago • 2 comments

Describe the bug

Trying to run the following fine-tuning script (based on this page here):

! accelerate launch /content/instruction-tuned-sd/finetune_instruct_pix2pix.py \
    --pretrained_model_name_or_path=${MODEL_ID} \
    --dataset_name=${DATASET_NAME} \
    --use_ema \
    --enable_xformers_memory_efficient_attention \
    --resolution=512 --random_flip \
    --train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \
    --max_train_steps=500 \
    --checkpointing_steps=25 --checkpoints_total_limit=1 \
    --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=20 \
    --conditioning_dropout_prob=0.1 \
    --mixed_precision=fp16 \
    --seed=42 \
    --output_dir=${OUTPUT_DIR} \
    --original_image_column=before \
    --edit_prompt=prompt \
    --edited_image=after

but I keep getting the following error:

Traceback (most recent call last):
  File "/content/instruction-tuned-sd/finetune_instruct_pix2pix.py", line 1137, in <module>
    main()
  File "/content/instruction-tuned-sd/finetune_instruct_pix2pix.py", line 652, in main
    dataset = load_dataset(
              ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/datasets/load.py", line 2129, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/datasets/load.py", line 1886, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/datasets/builder.py", line 342, in __init__
    self.config, self.config_id = self._create_builder_config(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/datasets/builder.py", line 590, in _create_builder_config
    raise ValueError(f"BuilderConfig {builder_config} doesn't have a '{key}' key.")
ValueError: BuilderConfig ParquetConfig(name='default', version=0.0.0, data_dir=None, data_files={'train': ['data/train-*']}, description=None, batch_size=None, columns=None, features=None, filters=None) doesn't have a 'use_auth_token' key.
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 10, in <module>
    sys.exit(main())
             ^^^^^^

Any ideas? datasets version should be 3.2.0.

Steps to reproduce the bug

Just running the script above.

Expected behavior

No errors

Environment info

Python 3.11.11

datasets==3.2.0

tteguayco avatar Apr 08 '25 10:04 tteguayco

I encountered the same error, have you resolved it?

Alex9154 avatar Apr 11 '25 08:04 Alex9154

Hi ! use_auth_token has been deprecated and removed some time ago. You should use token instead in load_dataset()

lhoestq avatar Apr 15 '25 12:04 lhoestq

Hi @lhoestq, I'd like to take this up.

As discussed in #7504, the issue arises when use_auth_token is passed to load_dataset, which forwards it to the config's __init__, where it's no longer a valid key.

To address this, I’ll intercept and strip use_auth_token inside load_dataset() (similar to how we handle trust_remote_code). A warning will be logged, and users will be encouraged to use token instead.

This avoids breaking older scripts that still use use_auth_token.

ArjunJagdale avatar Jun 28 '25 09:06 ArjunJagdale