gama
gama copied to clipboard
An automated machine learning tool aimed to facilitate AutoML research.
1) There are incompatible issues when i install the GAMA on Google Colab. 2) when i run the official examples,the data can not be fitted into the model ERROR: pip's...
This PR aims at implementing [dirty_cat](https://github.com/dirty-cat/dirty_cat)'s encoders (currently [SimilarityEncoder](https://dirty-cat.github.io/stable/generated/dirty_cat.SimilarityEncoder.html), [GapEncoder](https://dirty-cat.github.io/stable/generated/dirty_cat.GapEncoder.html) and [MinHashEncoder](https://dirty-cat.github.io/stable/generated/dirty_cat.MinHashEncoder.html)) to GAMA's search space via the use of the [SuperVectorizer](https://dirty-cat.github.io/stable/generated/dirty_cat.SuperVectorizer.html). The point of adding the dirty_cat encoders is...
It's a lot of work currently for a user to convert model string to gama individual string format. It will be great if we can have a function for that...
Perhaps gama could switch from evaluating multiple pipelines single-core in parallel to evaluating one pipeline multi-core sequentially. This could lead to improved performance due to e.g. less memory use.
Constructing the ensemble, i.e. picking the models and assigning their weights, can be parallelized. Multiple models can be considered for adding in parallel.
A static 5-fold CV is not good. It should ideally be dynamic ranging from e.g. repeated k-fold for small datasets to hold-out for large ones.
GAMA is currently not reproducible due to randomness in the timing of asynchronous tasks. While this *may* be alleviated by opting for different forms of parallelism, this is out of...
Not all code is currently covered by unit and/or system tests. In some cases, this does not matter (e.g. not all ValueError scenarios need to be automatically checked, I think),...
Currently, creating a new individual or mutating one is done at random. Only after creation we make sure that the individual is actually new, and if not, start over. This...