river icon indicating copy to clipboard operation
river copied to clipboard

Toy datasets like stream-learn

Open MaxHalford opened this issue 8 months ago • 0 comments

  • stream-learn has a nice StreamGenerator class that can generate different kinds of concept drift, see here. Also see their paper, which describes each type of drift. It would be nice to have the same. We reached out to them but they were not open to cooperation.
  • I think we should have multiple classes rather than a single one. One class per type of concept drift sounds good. Making them composable would be ideal for two reasons:
    • We can add drift to any dataset
    • We can mix different kinds of drifts
  • I suggest creating a new datasets.drift submodule
  • It would be cool to see if we could remove some of our existing dataset classes in favor of these new ones, to avoid repetition (FriedmanDrift, RandomRBFDrift, LEDDrift, ConceptDriftStream)
  • Ideally we would like to have the same documentation page with descriptive plots of each dataset
  • In fact, we also need a nice page that describes each dataset, like what scikit-learn does here

MaxHalford avatar Nov 01 '23 16:11 MaxHalford