imbalanced-learn
imbalanced-learn copied to clipboard
PyTorch utilities sampler
We could add utilities for PyTorch.
Basically it should be inheriting from torch.utils.data.Sampler
.
The implementation could look like something:
class BalancedSampler(Sampler):
def __init__(self, X, y, sampler=None, random_state=None):
self.X = X
self.y = y
self.sampler = sampler
self.random_state = random_state
self._sample()
def _sample(self):
random_state = check_random_state(self.random_state)
if self.sampler is None:
self.sampler_ = RandomUnderSampler(return_indices=True,
random_state=random_state)
else:
if not hasattr(self.sampler, 'return_indices'):
raise ValueError("'sampler' needs to return the indices of "
"the samples selected. Provide a sampler "
"which has an attribute 'return_indices'.")
self.sampler_ = clone(self.sampler)
self.sampler_.set_params(return_indices=True)
set_random_state(self.sampler_, random_state)
_, _, self.indices_ = self.sampler_.fit_sample(self.X, self.y)
# shuffle the indices since the sampler are packing them by class
random_state.shuffle(self.indices_)
def __iter__(self):
return iter(self.indices_.tolist())
def __len__(self):
return len(self.X.shape[0])
I can't help with this. I have never had the chance to play with PyTorch.
Is there any difference between I resample the data with the samplers before feed into neural networks and using the generators to train?
Memory usage mainly
On Fri, 16 Aug 2019 at 11:33, Kai He [email protected] wrote:
Is there any difference between I resample the data with the samplers before feed into neural networks and using the generators to train?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/imbalanced-learn/issues/424?email_source=notifications&email_token=ABY32P5G4ZXIUSZCJXBU3V3QEZX5JA5CNFSM4E7TFQX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4OFE2Y#issuecomment-521949803, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY32PY5A54LIM5OEG53EHTQEZX5JANCNFSM4E7TFQXQ .
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
@glemaitre has any progress been made on this?
@jnothman @glemaitre Can I take it up if nobody is working on it?