accelerate
accelerate copied to clipboard
`accelerator.prepare(dataloader)` fails when batch_sampler is not given
System Info
- `Accelerate` version: 0.12.0
- Platform: Linux-4.4.0-210-generic-x86_64-with-debian-stretch-sid
- Python version: 3.7.6
- Numpy version: 1.21.2
- PyTorch version (GPU?): 1.13.0+cu117 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- main_process_ip: None
- main_process_port: None
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
- downcast_bf16: False
Information
- [x] The official example scripts
- [X] My own modified scripts
Tasks
- [x] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [X] My own task or dataset (give details below)
Reproduction
Paste the following codes to a file script.py
and then the command accelerate launch script.py
import random
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from accelerate import Accelerator
class VallinaCollator(object):
"""
A vallina Collator for collating samples with varied length in a mini-batch
"""
def __init__(self):
pass
@staticmethod
def collate(batch):
lens = [b["len"] for b in batch]
feats = torch.zeros(len(batch), max(lens))
for i, b in enumerate(batch):
feats[i, :b["len"]] = b["feat"]
return {
"feats": feats.float(),
"lens": torch.tensor(lens).long(),
}
def __call__(self, *args, **kwargs):
return self.collate(*args, **kwargs)
class VallinaDataset(Dataset):
"""
A vallina Dataset.
"""
def __init__(self):
lens = np.random.randint(0, 100, (1000,))
self.idx_and_lens = [(i, t) for i, t in enumerate(lens)]
self.total_len_in_batch = 10000 # total_len_in_batch: the maximum frames in a mini-batch.
self.collator = VallinaCollator()
self.minibatches = []
self.shuffle()
def shuffle(self):
random.shuffle(self.idx_and_lens)
self._init_minibatches()
random.shuffle(self.minibatches)
def _init_minibatches(self):
self.minibatches = []
max_len = -1
sample_cnt = 0
self.minibatches.append([])
for sample in self.idx_and_lens:
idx, sample_len = sample
max_len = max(max_len, sample_len)
sample_cnt += 1
lens_in_batch = max_len * sample_cnt
if lens_in_batch > self.total_len_in_batch: # open a new mini-batch
self.minibatches.append([])
sample_cnt = 1
max_len = sample_len
self.minibatches[-1].append(sample)
@staticmethod
def _get_one_sample(sample):
idx, sample_len = sample
feat = torch.randn((1, sample_len))
return {
"len": sample_len,
"feat": feat
}
def __getitem__(self, idx):
uncollated_samples = [self._get_one_sample(t) for t in self.minibatches[idx]]
return self.collator(uncollated_samples)
def __len__(self):
return len(self.minibatches)
dataloader = DataLoader(
dataset=VallinaDataset(),
batch_size=None,
shuffle=False,
batch_sampler=None,
sampler=None,
drop_last=False,
collate_fn=None,
pin_memory=True,
num_workers=2
)
for data in dataloader:
break
accelerator = Accelerator()
dataloader = accelerator.prepare(dataloader)
for data in dataloader:
break
Expected behavior
Here's my problem: When creating a dataloader, I want everything be controlled inside the Dataset
instance. For example, I want to manually shuffle the dataset by the method shuffle
inside VallinaDataset
as shown above instead of using a BatchSampler
or the argument shuffle=True
, or collating the samples in a mini-batch inside the method __getitem__
instead of by the argument collate_fn=...
.
In torch.data.utils.DataLoader
class, if none of BatchSampler
, Sampler
, batch_size
, collate_fn
is given, when iterating the dataloader by for data in dataloader
, the dataloader just returns the return values of __get_item__
one by one, so we can see that the code performs well before dataloader = accelerator.prepare(dataloader)
is called.
However, Accelerate seems to not support it because it calls self.batch_size = batch_sampler.batch_size
in the construction function of the class BatchSamplerShard
.
The full error msg is as follows:
Traceback (most recent call last):
File "./script.py", line 98, in <module>
dataloader = accelerator.prepare(dataloader)
File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 621, in prepare
result = tuple(self._prepare_one(obj, first_pass=True) for obj in args)
File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 621, in <genexpr>
result = tuple(self._prepare_one(obj, first_pass=True) for obj in args)
File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 516, in _prepare_one
return self.prepare_data_loader(obj)
File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py", line 850, in prepare_data_loader
dispatch_batches=self.dispatch_batches,
File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/data_loader.py", line 657, in prepare_data_loader
split_batches=split_batches,
File "~/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/data_loader.py", line 138, in __init__
self.batch_size = batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'
This could be solved by changing
dataloader = DataLoader(
dataset=VallinaDataset(),
batch_size=None,
shuffle=False,
batch_sampler=None,
sampler=None,
drop_last=False,
collate_fn=None,
pin_memory=True,
num_workers=2
)
to
dset = VallinaDataset()
dataloader = DataLoader(
dataset=dset,
batch_size=1,
shuffle=False,
batch_sampler=None,
sampler=None,
drop_last=False,
collate_fn=dset.collator,
pin_memory=True,
num_workers=2
)
and adding batch = batch[0]
in the very beginning of the method collate
inside of VallinaCollator
class.
However, it's kind of ugly.
Hello @xiabingquan, can you try the latest accelerate version and let us know if that solves the issue? It has been handled in version 0.13 here: https://github.com/huggingface/accelerate/blob/v0.14.0/src/accelerate/data_loader.py#L144
Thanks for your reply @pacman100 . Unluckily, it didn't resolve the issue :( Another error occurred. The full trackback is as follows:
ââââââââââââââââââââââââââââââââ Traceback (most recent call last) âââââââââââââââââââââââââââââââââź
â /home/code/xiabingquan/./a.py:98 in <module> â
â â
â 95 â
â 96 â
â 97 accelerator = Accelerator() â
â â± 98 dataloader = accelerator.prepare(dataloader) â
â 99 for data in dataloader: â
â 100 â break â
â â
â /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py â
â :760 in prepare â
â â
â 757 â â â result = self._prepare_megatron_lm(*args) â
â 758 â â else: â
â 759 â â â result = tuple( â
â â± 760 â â â â self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d i â
â 761 â â â ) â
â 762 â â â result = tuple(self._prepare_one(obj, device_placement=d) for obj, d in zip( â
â 763 â
â â
â /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py â
â :760 in <genexpr> â
â â
â 757 â â â result = self._prepare_megatron_lm(*args) â
â 758 â â else: â
â 759 â â â result = tuple( â
â â± 760 â â â â self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d i â
â 761 â â â ) â
â 762 â â â result = tuple(self._prepare_one(obj, device_placement=d) for obj, d in zip( â
â 763 â
â â
â /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py â
â :622 in _prepare_one â
â â
â 619 â â # First pass of preparation: DataLoader, model, optimizer â
â 620 â â if first_pass: â
â 621 â â â if isinstance(obj, torch.utils.data.DataLoader): â
â â± 622 â â â â return self.prepare_data_loader(obj, device_placement=device_placement) â
â 623 â â â elif isinstance(obj, torch.nn.Module): â
â 624 â â â â return self.prepare_model(obj, device_placement=device_placement) â
â 625 â â â elif isinstance(obj, torch.optim.Optimizer): â
â â
â /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/accelerator.py â
â :1129 in prepare_data_loader â
â â
â 1126 â â â put_on_device=device_placement, â
â 1127 â â â rng_types=self.rng_types.copy(), â
â 1128 â â â dispatch_batches=self.dispatch_batches, â
â â± 1129 â â â even_batches=self.even_batches, â
â 1130 â â ) â
â 1131 â â
â 1132 â def prepare_optimizer(self, optimizer: torch.optim.Optimizer, device_placement=None) â
â â
â /home/code/xiabingquan/miniconda3/envs/lrs/lib/python3.7/site-packages/accelerate/data_loader.py â
â :681 in prepare_data_loader â
â â
â 678 â â â if sampler_is_batch_sampler: â
â 679 â â â â sampler = dataloader.sampler.sampler â
â 680 â â â else: â
â â± 681 â â â â sampler = dataloader.batch_sampler.sampler â
â 682 â â â if hasattr(sampler, "generator"): â
â 683 â â â â if sampler.generator is None: â
â 684 â â â â â sampler.generator = torch.Generator() â
â°âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââŻ
AttributeError: 'NoneType' object has no attribute 'sampler'
[10:17:14] ERROR failed (exitcode: 1) local_rank: 0 (pid: 24113) of binary: /home/code/xiabingquan/miniconda3/envs/lrs/bin/python api.py:674
It looks like Accelerate
is trying to get the batch_sampler
of a dataset even it doesn't have one.
I have this bug as well.
/path/to/python3.10/site-packages/accelerate/data_loader.py", line 693, in prepare_data_loader
sampler = dataloader.batch_sampler.sampler
AttributeError: 'BatchSamplerShard' object has no attribute 'sampler'
I'm using v0.18
Could you open a clean issue with a reproducer? There is nothing we can do to help without that.
accelerate 0.19.0> æäčæèżäžȘéèŻŻă
/path/to/python3.10/site-packages/accelerate/data_loader.py", line 693, in prepare_data_loader sampler = dataloader.batch_sampler.sampler AttributeError: 'BatchSamplerShard' object has no attribute 'sampler'
ææŁćšäœżçšv0.18
me too