Pedro Ribeiro comments

Results 36 comments of


                                            Pedro Ribeiro

Coefficient of Logistic Regression Using TPOT

What does the rest of the pipeline look like? Sometimes transformers, feature unions, or stackingestimators increase the number of features.

FeatureSetSelector does not work when not set as the first item in a template.

An additional useful feature, but may be more difficult to implement, would be to have FSS pass in different data to different "branches". For example: ``` FFS -> classifier \...

n_jobs Not Stopping CPU from Running at 100% on All Threads?

when use_dask is set to True, the n_jobs argument is actually not used. This should probably be fixed. When using dask it just uses dask.compute without setting the number of...

FeatureUnion + StackingEstimator causes input data to be duplicated for the rest of the model, increasing computational load and complexity.

I wanted to add another data replication issue. The FunctionTransformer module can also be set to exactly copy the input into the next layer. I have generated another pipeline where...

[Question] TPOT can't run for a lot of time

This is most likely related to TPOT not being able to terminate some pipelines. The current timeout method doesn't always work on specific modules. If those modules can't be timed...

XGBRegressor ValueError: feature_names mismatch:

can you provide code to fully reproduce this issue? It looks like the column names are getting renamed between housing2 to X_train. How is X_train being created?

Benchmarks available?

thank you for the interest! We are currently working on benchmarking and some optimization. Our preliminary results so far suggests that tpot2 performs similarly to tpot1 with similar parameters. however,...

Benchmarks available?

TPOT2 parallelizes evaluations of pipelines with Dask under the hood. There is nothing that the user needs to do to use it. You can tell TPOT2 how many processes/cores to...

Reproducibility of the export pipeline

If you set n_jobs to 1, reproducibility is more likely. When using parallel processes, exact reproducibility gets challenging since the order of execution has some randomness that is not controllable....

Feature Mismatch caused by StackingEstimator: X has _ features, but ____ is expecting _ features as input.

"there was a row that blew up the LogisticRegression" Can you clarify the what is in the row that causes LogisticRegression to sometimes yield different number of outputs? I'm trying...