gama issues

package is incompatible.

1

1) There are incompatible issues when i install the GAMA on Google Colab. 2) when i run the official examples，the data can not be fitted into the model ERROR: pip's...

wangtvb

Implement the SuperVectorizer and dirty_cat's encoders to the search space

4

This PR aims at implementing [dirty_cat](https://github.com/dirty-cat/dirty_cat)'s encoders (currently [SimilarityEncoder](https://dirty-cat.github.io/stable/generated/dirty_cat.SimilarityEncoder.html), [GapEncoder](https://dirty-cat.github.io/stable/generated/dirty_cat.GapEncoder.html) and [MinHashEncoder](https://dirty-cat.github.io/stable/generated/dirty_cat.MinHashEncoder.html)) to GAMA's search space via the use of the [SuperVectorizer](https://dirty-cat.github.io/stable/generated/dirty_cat.SuperVectorizer.html). The point of adding the dirty_cat encoders is...

LilianBoulard

Better warm starting with automatically converting pipeline or pipeline string to gama individual string

6

It's a lot of work currently for a user to convert model string to gama individual string format. It will be great if we can have a function for that...

prabhant

enhancement

feature

Use DASK to support caching and parallelism

1

Perhaps gama could switch from evaluating multiple pipelines single-core in parallel to evaluating one pipeline multi-core sequentially. This could lead to improved performance due to e.g. less memory use.

PGijsbers

enhancement

Adding program flow / UML diagram to developer documentation

4

PGijsbers

documentation

Parallelize Ensemble Construction

Constructing the ensemble, i.e. picking the models and assigning their weights, can be parallelized. Multiple models can be considered for adding in parallel.

PGijsbers

enhancement

Dynamically pick evaluation strategy

1

A static 5-fold CV is not good. It should ideally be dynamic ranging from e.g. repeated k-fold for small datasets to hold-out for large ones.

PGijsbers

feature

Replay Functionality

GAMA is currently not reproducible due to randomness in the timing of asynchronous tasks. While this *may* be alleviated by opting for different forms of parallelism, this is out of...

PGijsbers

feature

Increase code unit test coverage

5

Not all code is currently covered by unit and/or system tests. In some cases, this does not matter (e.g. not all ValueError scenarios need to be automatically checked, I think),...

PGijsbers

Code Quality

Create a structured way to perform mutation/create new individuals.

1

Currently, creating a new individual or mutating one is done at random. Only after creation we make sure that the individual is actually new, and if not, start over. This...

PGijsbers

enhancement

gama
gama copied to clipboard

Metadata

package is incompatible.

Implement the SuperVectorizer and dirty_cat's encoders to the search space

Better warm starting with automatically converting pipeline or pipeline string to gama individual string

Use DASK to support caching and parallelism

Adding program flow / UML diagram to developer documentation

Parallelize Ensemble Construction

Dynamically pick evaluation strategy

Replay Functionality

Increase code unit test coverage

Create a structured way to perform mutation/create new individuals.

← Metadata

Owner

Metadata

gama gama copied to clipboard

Metadata

← Metadata

Owner

Metadata

gama
gama copied to clipboard