accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

`accelerator.prepare(dataloader)` fails when batch_sampler is not given

Open xiabingquan opened this issue 2 years ago ‱ 6 comments

System Info

- `Accelerate` version: 0.12.0
- Platform: Linux-4.4.0-210-generic-x86_64-with-debian-stretch-sid
- Python version: 3.7.6
- Numpy version: 1.21.2
- PyTorch version (GPU?): 1.13.0+cu117 (True)
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: no
        - use_cpu: False
        - num_processes: 2
        - machine_rank: 0
        - num_machines: 1
        - main_process_ip: None
        - main_process_port: None
        - main_training_function: main
        - deepspeed_config: {}
        - fsdp_config: {}
        - downcast_bf16: False

Information

  • [x] The official example scripts
  • [X] My own modified scripts

Tasks

  • [x] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • [X] My own task or dataset (give details below)

Reproduction

Paste the following codes to a file script.py and then the command accelerate launch script.py

import random

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from accelerate import Accelerator


class VallinaCollator(object):
    """
    A vallina Collator for collating samples with varied length in a mini-batch
    """
    def __init__(self):
        pass

    @staticmethod
    def collate(batch):
        lens = [b["len"] for b in batch]
        feats = torch.zeros(len(batch), max(lens))
        for i, b in enumerate(batch):
            feats[i, :b["len"]] = b["feat"]
        return {
            "feats": feats.float(),
            "lens": torch.tensor(lens).long(),
        }

    def __call__(self, *args, **kwargs):
        return self.collate(*args, **kwargs)


class VallinaDataset(Dataset):
    """
    A vallina Dataset.
    """
    def __init__(self):
        lens = np.random.randint(0, 100, (1000,))
        self.idx_and_lens = [(i, t) for i, t in enumerate(lens)]
        self.total_len_in_batch = 10000     # total_len_in_batch: the maximum frames in a mini-batch.
        self.collator = VallinaCollator()
        self.minibatches = []
        self.shuffle()

    def shuffle(self):
        random.shuffle(self.idx_and_lens)
        self._init_minibatches()
        random.shuffle(self.minibatches)

    def _init_minibatches(self):
        self.minibatches = []
        max_len = -1
        sample_cnt = 0
        self.minibatches.append([])
        for sample in self.idx_and_lens:
            idx, sample_len = sample
            max_len = max(max_len, sample_len)
            sample_cnt += 1
            lens_in_batch = max_len * sample_cnt
            if lens_in_batch > self.total_len_in_batch:     # open a new mini-batch
                self.minibatches.append([])
                sample_cnt = 1
                max_len = sample_len
            self.minibatches[-1].append(sample)

    @staticmethod
    def _get_one_sample(sample):
        idx, sample_len = sample
        feat = torch.randn((1, sample_len))
        return {
            "len": sample_len,
            "feat": feat
        }

    def __getitem__(self, idx):
        uncollated_samples = [self._get_one_sample(t) for t in self.minibatches[idx]]
        return self.collator(uncollated_samples)

    def __len__(self):
        return len(self.minibatches)


dataloader = DataLoader(
    dataset=VallinaDataset(),
    batch_size=None,
    shuffle=False,
    batch_sampler=None,
    sampler=None,
    drop_last=False,
    collate_fn=None,
    pin_memory=True,
    num_workers=2
)

for data in dataloader:
    break


accelerator = Accelerator()
dataloader = accelerator.prepare(dataloader)
for data in dataloader:
    break

Expected behavior

Here's my problem: When creating a dataloader, I want everything be controlled inside the Dataset instance. For example, I want to manually shuffle the dataset by the method shuffle inside VallinaDataset as shown above instead of using a BatchSampler or the argument shuffle=True, or collating the samples in a mini-batch inside the method __getitem__ instead of by the argument collate_fn=....

In torch.data.utils.DataLoader class, if none of BatchSampler, Sampler, batch_size, collate_fn is given, when iterating the dataloader by for data in dataloader, the dataloader just returns the return values of __get_item__ one by one, so we can see that the code performs well before dataloader = accelerator.prepare(dataloader) is called.

However, Accelerate seems to not support it because it calls self.batch_size = batch_sampler.batch_size in the construction function of the class BatchSamplerShard.

The full error msg is as follows:

Traceback (most recent call last):
  File "./script.py", line 98, in <module>
    dataloader = accelerator.prepare(dataloader)
  File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 621, in prepare
    result = tuple(self._prepare_one(obj, first_pass=True) for obj in args)
  File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 621, in <genexpr>
    result = tuple(self._prepare_one(obj, first_pass=True) for obj in args)
  File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 516, in _prepare_one
    return self.prepare_data_loader(obj)
  File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 850, in prepare_data_loader
    dispatch_batches=self.dispatch_batches,
  File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/data_loader.py", line 657, in prepare_data_loader
    split_batches=split_batches,
  File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/data_loader.py", line 138, in __init__
    self.batch_size = batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'

xiabingquan avatar Nov 23 '22 03:11 xiabingquan

This could be solved by changing

dataloader = DataLoader(
    dataset=VallinaDataset(),
    batch_size=None,
    shuffle=False,
    batch_sampler=None,
    sampler=None,
    drop_last=False,
    collate_fn=None,
    pin_memory=True,
    num_workers=2
)

to

dset = VallinaDataset()
dataloader = DataLoader(
    dataset=dset,
    batch_size=1,
    shuffle=False,
    batch_sampler=None,
    sampler=None,
    drop_last=False,
    collate_fn=dset.collator,
    pin_memory=True,
    num_workers=2
)

and adding batch = batch[0] in the very beginning of the method collate inside of VallinaCollator class.

However, it's kind of ugly.

xiabingquan avatar Nov 23 '22 03:11 xiabingquan

Hello @xiabingquan, can you try the latest accelerate version and let us know if that solves the issue? It has been handled in version 0.13 here: https://github.com/huggingface/accelerate/blob/v0.14.0/src/accelerate/data_loader.py#L144

pacman100 avatar Nov 23 '22 05:11 pacman100

Thanks for your reply @pacman100 . Unluckily, it didn't resolve the issue :( Another error occurred. The full trackback is as follows:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╼
│ /home/code/xiabingquan/./a.py:98 in <module>                                                     │
│                                                                                                  │
│    95                                                                                            │
│    96                                                                                            │
│    97 accelerator = Accelerator()                                                                │
│ ❱  98 dataloader = accelerator.prepare(dataloader)                                               │
│    99 for data in dataloader:                                                                    │
│   100 │   break                                                                                  │
│                                                                                                  │
│ /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py │
│ :760 in prepare                                                                                  │
│                                                                                                  │
│    757 │   │   │   result = self._prepare_megatron_lm(*args)                                     │
│    758 │   │   else:                                                                             │
│    759 │   │   │   result = tuple(                                                               │
│ ❱  760 │   │   │   │   self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d i  │
│    761 │   │   │   )                                                                             │
│    762 │   │   │   result = tuple(self._prepare_one(obj, device_placement=d) for obj, d in zip(  │
│    763                                                                                           │
│                                                                                                  │
│ /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py │
│ :760 in <genexpr>                                                                                │
│                                                                                                  │
│    757 │   │   │   result = self._prepare_megatron_lm(*args)                                     │
│    758 │   │   else:                                                                             │
│    759 │   │   │   result = tuple(                                                               │
│ ❱  760 │   │   │   │   self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d i  │
│    761 │   │   │   )                                                                             │
│    762 │   │   │   result = tuple(self._prepare_one(obj, device_placement=d) for obj, d in zip(  │
│    763                                                                                           │
│                                                                                                  │
│ /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py │
│ :622 in _prepare_one                                                                             │
│                                                                                                  │
│    619 │   │   # First pass of preparation: DataLoader, model, optimizer                         │
│    620 │   │   if first_pass:                                                                    │
│    621 │   │   │   if isinstance(obj, torch.utils.data.DataLoader):                              │
│ ❱  622 │   │   │   │   return self.prepare_data_loader(obj, device_placement=device_placement)   │
│    623 │   │   │   elif isinstance(obj, torch.nn.Module):                                        │
│    624 │   │   │   │   return self.prepare_model(obj, device_placement=device_placement)         │
│    625 │   │   │   elif isinstance(obj, torch.optim.Optimizer):                                  │
│                                                                                                  │
│ /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py │
│ :1129 in prepare_data_loader                                                                     │
│                                                                                                  │
│   1126 │   │   │   put_on_device=device_placement,                                               │
│   1127 │   │   │   rng_types=self.rng_types.copy(),                                              │
│   1128 │   │   │   dispatch_batches=self.dispatch_batches,                                       │
│ ❱ 1129 │   │   │   even_batches=self.even_batches,                                               │
│   1130 │   │   )                                                                                 │
│   1131 │                                                                                         │
│   1132 │   def prepare_optimizer(self, optimizer: torch.optim.Optimizer, device_placement=None)  │
│                                                                                                  │
│ /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/data_loader.py │
│ :681 in prepare_data_loader                                                                      │
│                                                                                                  │
│   678 │   │   │   if sampler_is_batch_sampler:                                                   │
│   679 │   │   │   │   sampler = dataloader.sampler.sampler                                       │
│   680 │   │   │   else:                                                                          │
│ ❱ 681 │   │   │   │   sampler = dataloader.batch_sampler.sampler                                 │
│   682 │   │   │   if hasattr(sampler, "generator"):                                              │
│   683 │   │   │   │   if sampler.generator is None:                                              │
│   684 │   │   │   │   │   sampler.generator = torch.Generator()                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'sampler'
[10:17:14] ERROR    failed (exitcode: 1) local_rank: 0 (pid: 24113) of binary: /home/code/xiabingquan/miniconda3/envs/lrs/bin/python              api.py:674

It looks like Accelerate is trying to get the batch_sampler of a dataset even it doesn't have one.

xiabingquan avatar Nov 26 '22 15:11 xiabingquan

I have this bug as well.

/path/to/python3.10/site-packages/accelerate/data_loader.py", line 693, in prepare_data_loader
    sampler = dataloader.batch_sampler.sampler
AttributeError: 'BatchSamplerShard' object has no attribute 'sampler'

I'm using v0.18

publicmatt avatar Mar 31 '23 21:03 publicmatt

Could you open a clean issue with a reproducer? There is nothing we can do to help without that.

sgugger avatar Apr 03 '23 13:04 sgugger

accelerate 0.19.0> 我äčŸæœ‰èż™äžȘé”™èŻŻă€‚

/path/to/python3.10/site-packages/accelerate/data_loader.py", line 693, in prepare_data_loader
    sampler = dataloader.batch_sampler.sampler
AttributeError: 'BatchSamplerShard' object has no attribute 'sampler'

æˆ‘æ­Łćœšäœżç”šv0.18

me too

masuxin avatar Aug 24 '23 06:08 masuxin