avalanche icon indicating copy to clipboard operation
avalanche copied to clipboard

GenericCLScenario donot support data for regression tasks. TypeError: unhashable type: 'list'

Open yiranma0 opened this issue 1 year ago • 1 comments

I am now working on a regression task. I found all the generators: filelist_benchmark, dataset_benchmark, tensors_benchmark, paths_benchmark, nc_benchmark, ni_benchmark do not support regression tasks. This might be caused by the function origin_stream.benchmark.get_classes_timeline. I guess this function is trying to get the number of unique classes in each experience while the labels of regression tasks are not discrete.

Here is a minimal working example:

import torch
from torch.utils.data import TensorDataset
from avalanche.benchmarks.generators import filelist_benchmark, dataset_benchmark, \
                                            tensors_benchmark, paths_benchmark,\
                                            nc_benchmark, ni_benchmark

train_datasets = (TensorDataset(torch.randn(100,10),torch.randn(100,1)),TensorDataset(torch.randn(100,10),torch.randn(100,1)))
test_datasets = (TensorDataset(torch.randn(10,10),torch.randn(10,1)),TensorDataset(torch.randn(10,10),torch.randn(10,1)))

# Create the continual learning scenario
scenario = dataset_benchmark(train_datasets=train_datasets, test_datasets=test_datasets)

for experience in scenario.train_stream: # TypeError: unhashable type: 'list'
    print("task ", experience.task_label)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 19
     17 # Create the continual learning scenario
     18 scenario = dataset_benchmark(train_datasets=train_datasets, test_datasets=test_datasets)
---> 19 for experience in scenario.train_stream:
     20     print("task ", experience.task_label)

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\generic_scenario.py:593, in SequenceCLStream.__iter__(self)
    591 exp: TCLExperience
    592 for i in range(len(self)):
--> 593     exp = self[i]
    594     yield exp

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\generic_scenario.py:616, in SequenceCLStream.__getitem__(self, item)
    612     raise IndexError("Experience index out of bounds" + str(int(item)))
    614 curr_exp = item if self.slice_ids is None else self.slice_ids[item]
--> 616 exp = self._make_experience(curr_exp)
    617 if self.set_stream_info:
    618     exp.current_experience = curr_exp

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\dataset_scenario.py:663, in FactoryBasedStream._make_experience(self, experience_idx)
    662 def _make_experience(self, experience_idx: int) -> TDatasetExperience:
--> 663     a = self.benchmark.experience_factory(self, experience_idx)  # type: ignore
    664     return a

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\classification_scenario.py:67, in _default_classification_experience_factory(stream, experience_idx)
     64 def _default_classification_experience_factory(
     65     stream: "ClassificationStream", experience_idx: int
     66 ):
---> 67     return ClassificationExperience(
     68         origin_stream=stream, current_experience=experience_idx
     69     )

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\classification_scenario.py:184, in ClassificationExperience.__init__(self, origin_stream, current_experience)
    173 self._benchmark: ClassificationScenario = origin_stream.benchmark
    175 dataset: TClassificationDataset = origin_stream.benchmark.stream_definitions[
    176     origin_stream.name
    177 ].exps_data[current_experience]
    179 (
    180     classes_in_this_exp,
    181     previous_classes,
    182     classes_seen_so_far,
    183     future_classes,
--> 184 ) = origin_stream.benchmark.get_classes_timeline(
    185     current_experience, stream=origin_stream.name
    186 )
    188 super().__init__(
    189     origin_stream,
    190     dataset,
   (...)
    195     future_classes,
    196 )

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\dataset_scenario.py:531, in ClassesTimelineCLScenario.get_classes_timeline(self, current_experience, stream)
    501 def get_classes_timeline(
    502     self, current_experience: int, stream: str = "train"
    503 ) -> Tuple[
   (...)
    507     Optional[List[int]],
    508 ]:
    509     """
    510     Returns the classes timeline given the ID of a experience.
    511 
   (...)
    529         the benchmark is initialized by using a lazy generator.
    530     """
--> 531     class_set_current_exp = self.classes_in_experience[stream][current_experience]
    533     if class_set_current_exp is not None:
    534         # May be None in lazy benchmarks
    535         classes_in_this_exp = list(class_set_current_exp)

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\classification_scenario.py:268, in _LazyClassesInClassificationExps.__getitem__(self, exp_id)
    266 def __getitem__(self, exp_id: Union[int, slice]) -> LazyClassesInExpsRet:
    267     indexing_collate = _LazyClassesInClassificationExps._slice_collate
--> 268     result = manage_advanced_indexing(
    269         exp_id, self._get_single_exp_classes, len(self), indexing_collate
    270     )
    271     return result

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\utils\dataset_utils.py:335, in manage_advanced_indexing(idx, single_element_getter, max_length, collate_fn)
    333 elements: List[X] = []
    334 for single_idx in indexes_iterator:
--> 335     single_element = single_element_getter(int(single_idx))
    336     elements.append(single_element)
    338 if len(elements) == 1:

File d:\anaconda3\envs\py39\lib\site-packages\avalanche\benchmarks\scenarios\classification_scenario.py:284, in _LazyClassesInClassificationExps._get_single_exp_classes(self, exp_id)
    281 if targets is None:
    282     return None
--> 284 return set(targets)

TypeError: unhashable type: 'list'

I think a generator for regression tasks is necessary.

yiranma0 avatar Aug 23 '23 17:08 yiranma0

Hi, I agree with you. This feature is on the roadmap and planned for the next release. Right now you can add a fake target attribute to your data, adding a list of zeros with the same length of the data to dataset.targets.

AntonioCarta avatar Aug 29 '23 11:08 AntonioCarta

Meet the same problem while trying to using avalanche doing physics related research

dajuguan avatar Mar 02 '24 13:03 dajuguan

This is fixed in the latest version. The new benchmark generators don't require class or task labels (example).

The old ones are still available for backward compatibility.

AntonioCarta avatar Mar 04 '24 13:03 AntonioCarta

This is fixed in the latest version. The new benchmark generators don't require class or task labels (example).

The old ones are still available for backward compatibility.

Thanks, now this problems is solved using 0.5.0. But when I directly use benchmark_from_datasets to generate data streams and train the steams with Naive, a new error no attribute 'targets_task_labels was thrown. So, I suggest add a simple example like this to facilitate onboarding new researcher in the Physics field who heavily use it for regression tasks.

dajuguan avatar Mar 06 '24 04:03 dajuguan

Yes, we need to do some work on the strategy side (I'm working on that). However, keep in mind that most methods are designed for classification, which means that we can easily remove the task labels (when unused) but they would not make sense for regression tasks.

Avalanche strategies are designed to be general (apart from these minor fixes that we have to do), so if a method can work with regression tasks then you should be able to use it without any issues by changing the loss function. However, the user needs to understand the method to know if it supports regression, which may be difficult for non-expert users. We don't really have a good solution for this because each method and task requires different considerations that are hard to generalize. For example, many methods are not directly applicable but very easy to generalize by changing few lines of code if you understand what they are doing.

AntonioCarta avatar Mar 06 '24 07:03 AntonioCarta

Yes, we need to do some work on the strategy side (I'm working on that). However, keep in mind that most methods are designed for classification, which means that we can easily remove the task labels (when unused) but they would not make sense for regression tasks.

Avalanche strategies are designed to be general (apart from these minor fixes that we have to do), so if a method can work with regression tasks then you should be able to use it without any issues by changing the loss function. However, the user needs to understand the method to know if it supports regression, which may be difficult for non-expert users. We don't really have a good solution for this because each method and task requires different considerations that are hard to generalize. For example, many methods are not directly applicable but very easy to generalize by changing few lines of code if you understand what they are doing.

Thanks, I'll dive into the code then~

dajuguan avatar Mar 06 '24 08:03 dajuguan