tsai
tsai copied to clipboard
Multilabel TSClassification Tutorial Notebook Example is Broken
When running notebook 01a_MultiClass_MultiLabel_TSClassification.ipynb under the MultiLabel section, specifically this cell code, results in this error (cropped screen shot):
Here is the full traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[28], line 2
1 learn = ts_learner(dls, InceptionTimePlus, metrics=accuracy_multi)
----> 2 learn.lr_find()
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/schedule.py:293, in lr_find(self, start_lr, end_lr, num_it, stop_div, show_plot, suggest_funcs)
291 n_epoch = num_it//len(self.dls.train) + 1
292 cb=LRFinder(start_lr=start_lr, end_lr=end_lr, num_it=num_it, stop_div=stop_div)
--> 293 with self.no_logging(): self.fit(n_epoch, cbs=cb)
294 if suggest_funcs is not None:
295 lrs, losses = tensor(self.recorder.lrs[num_it//10:-5]), tensor(self.recorder.losses[num_it//10:-5])
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:264, in Learner.fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch)
262 self.opt.set_hypers(lr=self.lr if lr is None else lr)
263 self.n_epoch = n_epoch
--> 264 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:199, in Learner._with_events(self, f, event_type, ex, final)
198 def _with_events(self, f, event_type, ex, final=noop):
--> 199 try: self(f'before_{event_type}'); f()
200 except ex: self(f'after_cancel_{event_type}')
201 self(f'after_{event_type}'); final()
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:253, in Learner._do_fit(self)
251 for epoch in range(self.n_epoch):
252 self.epoch=epoch
--> 253 self._with_events(self._do_epoch, 'epoch', CancelEpochException)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:199, in Learner._with_events(self, f, event_type, ex, final)
198 def _with_events(self, f, event_type, ex, final=noop):
--> 199 try: self(f'before_{event_type}'); f()
200 except ex: self(f'after_cancel_{event_type}')
201 self(f'after_{event_type}'); final()
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:247, in Learner._do_epoch(self)
246 def _do_epoch(self):
--> 247 self._do_epoch_train()
248 self._do_epoch_validate()
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:239, in Learner._do_epoch_train(self)
237 def _do_epoch_train(self):
238 self.dl = self.dls.train
--> 239 self._with_events(self.all_batches, 'train', CancelTrainException)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:199, in Learner._with_events(self, f, event_type, ex, final)
198 def _with_events(self, f, event_type, ex, final=noop):
--> 199 try: self(f'before_{event_type}'); f()
200 except ex: self(f'after_cancel_{event_type}')
201 self(f'after_{event_type}'); final()
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:205, in Learner.all_batches(self)
203 def all_batches(self):
204 self.n_iter = len(self.dl)
--> 205 for o in enumerate(self.dl): self.one_batch(*o)
File ~/gitwork/timeseriesAI/tsai/tsai/learner.py:40, in one_batch(self, i, b)
38 b_on_device = to_device(b, device=self.dls.device) if self.dls.device is not None else b
39 self._split(b_on_device)
---> 40 self._with_events(self._do_one_batch, 'batch', CancelBatchException)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:199, in Learner._with_events(self, f, event_type, ex, final)
198 def _with_events(self, f, event_type, ex, final=noop):
--> 199 try: self(f'before_{event_type}'); f()
200 except ex: self(f'after_cancel_{event_type}')
201 self(f'after_{event_type}'); final()
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:219, in Learner._do_one_batch(self)
217 self('after_pred')
218 if len(self.yb):
--> 219 self.loss_grad = self.loss_func(self.pred, *self.yb)
220 self.loss = self.loss_grad.clone()
221 self('after_loss')
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/losses.py:54, in BaseLoss.__call__(self, inp, targ, **kwargs)
52 if targ.dtype in [torch.int8, torch.int16, torch.int32]: targ = targ.long()
53 if self.flatten: inp = inp.view(-1,inp.shape[-1]) if self.is_2d else inp.view(-1)
---> 54 return self.func.__call__(inp, targ.view(-1) if self.flatten else targ, **kwargs)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/torch/nn/modules/loss.py:720, in BCEWithLogitsLoss.forward(self, input, target)
719 def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 720 return F.binary_cross_entropy_with_logits(input, target,
721 self.weight,
722 pos_weight=self.pos_weight,
723 reduction=self.reduction)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/torch/nn/functional.py:3146, in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
3110 r"""Function that measures Binary Cross Entropy between target and input
3111 logits.
3112
(...)
3143 >>> loss.backward()
3144 """
3145 if has_torch_function_variadic(input, target, weight, pos_weight):
-> 3146 return handle_torch_function(
3147 binary_cross_entropy_with_logits,
3148 (input, target, weight, pos_weight),
3149 input,
3150 target,
3151 weight=weight,
3152 size_average=size_average,
3153 reduce=reduce,
3154 reduction=reduction,
3155 pos_weight=pos_weight,
3156 )
3157 if size_average is not None or reduce is not None:
3158 reduction_enum = _Reduction.legacy_get_enum(size_average, reduce)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/torch/overrides.py:1551, in handle_torch_function(public_api, relevant_args, *args, **kwargs)
1545 warnings.warn("Defining your `__torch_function__ as a plain method is deprecated and "
1546 "will be an error in future, please define it as a classmethod.",
1547 DeprecationWarning)
1549 # Use `public_api` instead of `implementation` so __torch_function__
1550 # implementations can do equality/identity comparisons.
-> 1551 result = torch_func_method(public_api, types, args, kwargs)
1553 if result is not NotImplemented:
1554 return result
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/torch_core.py:382, in TensorBase.__torch_function__(cls, func, types, args, kwargs)
380 if cls.debug and func.__name__ not in ('__str__','__repr__'): print(func, types, args, kwargs)
381 if _torch_handled(args, cls._opt, func): types = (torch.Tensor,)
--> 382 res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
383 dict_objs = _find_args(args) if args else _find_args(list(kwargs.values()))
384 if issubclass(type(res),TensorBase) and dict_objs: res.set_meta(dict_objs[0],as_copy=True)
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/torch/_tensor.py:1295, in Tensor.__torch_function__(cls, func, types, args, kwargs)
1292 return NotImplemented
1294 with _C.DisableTorchFunctionSubclass():
-> 1295 ret = func(*args, **kwargs)
1296 if func in get_default_nowrap_functions():
1297 return ret
File ~/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/torch/nn/functional.py:3163, in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
3160 reduction_enum = _Reduction.get_enum(reduction)
3162 if not (target.size() == input.size()):
-> 3163 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
3165 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([384])) must be the same as input size (torch.Size([2304]))
I'm running this commit locally (currently head of main branch) and the output of my_setup() is:
os : Linux-6.2.0-34-generic-x86_64-with-glibc2.37
python : 3.11.3
tsai : 0.3.8
fastai : 2.7.12
fastcore : 1.5.29
torch : 2.0.1
device : 1 gpu (['NVIDIA GeForce RTX 3090'])
cpu cores : 24
threads per cpu : 1
RAM : 125.53 GB
GPU memory : [24.0] GB
I would be happy to spend quite a bit more effort in figuring out the right way to do this example. Thanks!
@oguiza @williamsdoug
As I said, I'm fairly motivated to help fix this issue. I went back and ran this notebook with v0.3.7 and v0.3.6 and got the same ValueError: Target size (torch.Size([384])) must be the same as input size (torch.Size([2304]))
But when I checked out v0.3.5 that bit of code ran without error! I will post back soon with any other diagnostic information that I discover. Thanks for creating this awesome package and example code.
So, moving on from that line in v0.3.5 I do eventually run into an error at the end of the training loop. But I do consider this great progress (retrogress?). See truncated screenshot of notebook:
Full stack trace:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[20], line 2
1 learn = ts_learner(dls, InceptionTimePlus, metrics=[partial(accuracy_multi, by_sample=True), partial(accuracy_multi, by_sample=False)], cbs=ShowGraph())
----> 2 learn.fit_one_cycle(10, lr_max=1e-3)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/schedule.py:119, in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt, start_epoch)
116 lr_max = np.array([h['lr'] for h in self.opt.hypers])
117 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
118 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 119 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:264, in Learner.fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch)
262 self.opt.set_hypers(lr=self.lr if lr is None else lr)
263 self.n_epoch = n_epoch
--> 264 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:201, in Learner._with_events(self, f, event_type, ex, final)
199 try: self(f'before_{event_type}'); f()
200 except ex: self(f'after_cancel_{event_type}')
--> 201 self(f'after_{event_type}'); final()
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:172, in Learner.__call__(self, event_name)
--> 172 def __call__(self, event_name): L(event_name).map(self._call_one)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastcore/foundation.py:156, in L.map(self, f, *args, **kwargs)
--> 156 def map(self, f, *args, **kwargs): return self._new(map_ex(self, f, *args, gen=False, **kwargs))
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastcore/basics.py:840, in map_ex(iterable, f, gen, *args, **kwargs)
838 res = map(g, iterable)
839 if gen: return res
--> 840 return list(res)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastcore/basics.py:825, in bind.__call__(self, *args, **kwargs)
823 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
824 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 825 return self.func(*fargs, **kwargs)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:176, in Learner._call_one(self, event_name)
174 def _call_one(self, event_name):
175 if not hasattr(event, event_name): raise Exception(f'missing {event_name}')
--> 176 for cb in self.cbs.sorted('order'): cb(event_name)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/core.py:62, in Callback.__call__(self, event_name)
60 try: res = getcallable(self, event_name)()
61 except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
---> 62 except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)
63 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
64 return res
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/core.py:60, in Callback.__call__(self, event_name)
58 res = None
59 if self.run and _run:
---> 60 try: res = getcallable(self, event_name)()
61 except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
62 except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)
File ~/gitwork/timeseriesAI/tsai/tsai/callback/core.py:101, in ShowGraph.after_fit(self)
99 plt.close(self.graph_ax.figure)
100 if self.plot_metrics:
--> 101 self.learn.plot_metrics(final_losses=self.final_losses, perc=self.perc)
File ~/gitwork/timeseriesAI/tsai/tsai/learner.py:231, in plot_metrics(self, **kwargs)
228 @patch
229 @delegates(subplots)
230 def plot_metrics(self: Learner, **kwargs):
--> 231 self.recorder.plot_metrics(**kwargs)
File ~/gitwork/timeseriesAI/tsai/tsai/learner.py:218, in plot_metrics(self, nrows, ncols, figsize, final_losses, perc, **kwargs)
216 else:
217 color = '#ff7f0e'
--> 218 label = 'valid' if (m != [None] * len(m)).all() else None
219 axs[ax_idx].grid(color='gainsboro', linewidth=.5)
220 axs[ax_idx].plot(xs, m, color=color, label=label)
AttributeError: Exception occured in `ShowGraph` when calling event `after_fit`:
'bool' object has no attribute 'all'
I am going to guess that some of my dependencies are too new for this older tsai version, here is my setup for the system I'm currently testing:
os : Linux-6.2.0-34-generic-x86_64-with-glibc2.35
python : 3.11.0
tsai : 0.3.5
fastai : 2.7.12
fastcore : 1.5.29
torch : 2.0.1
cpu cores : 4
threads per cpu : 2
RAM : 62.58 GB
GPU memory : N/A
For the TSMultiLabelClassification comparing the output of learn.model for InceptionTimePlus between v0.3.5 and v0.3.6 everything is the same except for the learn.model.head:
- v0.3.5:
Sequential(
(0): create_lin_nd_head(
(0): fastai.layers.Flatten(full=False)
(1): Linear(in_features=17920, out_features=6, bias=True)
(2): Reshape(bs, 6)
)
)
- v0.3.6:
Sequential(
(0): lin_nd_head(
(0): Reshape(bs)
(1): Linear(in_features=17920, out_features=36, bias=True)
(2): Reshape(bs, 6, 6)
)
)
Clearly something has changed and maybe its not shaping up the output properly?
For the Multi-class TSClassification, which works in both versions, the differences are more subtle for learn.model.head:
learn.model.head:
- v0.3.5:
Sequential(
(0): Sequential(
(0): GAP1d(
(gap): AdaptiveAvgPool1d(output_size=1)
(flatten): fastai.layers.Flatten(full=False)
)
(1): LinBnDrop(
(0): Linear(in_features=128, out_features=5, bias=True)
)
)
)
- v0.3.6:
Sequential(
(0): Sequential(
(0): GAP1d(
(gap): AdaptiveAvgPool1d(output_size=1)
(flatten): Reshape(bs)
)
(1): LinBnDrop(
(0): Linear(in_features=128, out_features=5, bias=True)
)
)
)
@oguiza I just noticed the similarity with previously fixed issues:
- https://github.com/timeseriesAI/tsai/issues/420
- https://github.com/timeseriesAI/tsai/issues/533
- https://github.com/timeseriesAI/tsai/issues/534
Could be a regression, I will try to study what the fixes were there until someone more qualified can take over ;)
@oguiza
I was eventually able to hunt down the problematic commit 9caff8f by using git bisect between tags 0.3.6 (bad) and 0.3.5 (good) and checking the notebook example. There is some uncertainty about whether reverting that commit will cause other regressions in other parts of the code base, so I submitted a draft PR to fix the issue: https://github.com/timeseriesAI/tsai/pull/855
It would be awesome if the maintainers can help with implementing the fix. I have about exhausted my capabilities to really understand how the magic model auto-configuration system is supposed to work. Once a fix is decided upon and tested, I will happily close out the issue!
Having the same issue!
@oguiza @Munib5 Sorry, I thought I had a real fix, but I was mistaken again.
Tracing what is going wrong is a bit maddening. The problem starts with the creation of the DataLoaders:
At this point the dls object has two attributes:
dls.c == 6
dls.d == 6
When the learner.ts_learner function is invoked as in:
learn = ts_learner(dls, InceptionTimePlus, metrics=accuracy_multi, verbose=True)
it delegates to models.utils.build_ts_model with c_out=None and d=None. Internally those parameters are obtained from the dls object (see source).
Still inside build_ts_model, a 'custom_head' argument gets tacked on to a kwargs dict (see source):
kwargs['custom_head'] = partial(kwargs['custom_head'], d=d)
here d=6. The kwargs dict is later injected into the model configuration (see source):
model = arch(c_in, c_out, seq_len=seq_len, **arch_config, **kwargs).to(device=device)
As indicated by this printout if you set verbose=True in the ts_learner call:
arch: InceptionTimePlus(c_in=1 c_out=6 seq_len=140 arch_config={} kwargs={'custom_head': functools.partial(<class 'tsai.models.layers.lin_nd_head'>, d=6)})
Subsequently when the lin_nd_head constructor is called (see source), the local shape and fd variables take on these values
shape == [6,6]
fd == 6
Later in that constructor (see code), this problematically shaped layer is appended to the model:
else:
if seq_len == 1:
layers += [nn.AdaptiveAvgPool1d(1)]
if not flatten and fd == seq_len:
layers += [Transpose(1,2), nn.Linear(n_in, n_out)]
else:
layers += [Reshape(), nn.Linear(n_in * seq_len, n_out * fd)]
layers += [Reshape(*shape)]
where the keys variables take on these values:
n_in == 128
seq_len == 140
n_out == 6
fd == 6
shape == [6,6]
That shows that there is potentially a conflict between the property c which is supposed to be the "number of classes/categories" and the property d which possibly means some sort of dimension (? the source isn't very clear on these semantics).
Again, my limited understanding of the architecture of this library is providing a major impediment for finding a fix that will satisfy everyone :) Here's hoping that the maintainers will take over!
@Munib5 With all that said, a temporary workaround might be to do something like:
learn = ts_learner(dls, InceptionTimePlus, metrics=accuracy_multi, verbose=True, d=1)
where we force the d property back to something that provides a compatible shape for the torch.binary_cross_entropy_with_logits function.