Randy Olson comments

Results 92 comments of


                                            Randy Olson

Getting this error when i try to plot my dataframe 'callers' on sns

Hello, What project is this from? I don’t recognize it. On Thu, Aug 12, 2021 at 6:35 AM edithwangu ***@***.***> wrote: > ------------------------------ > > KeyError Traceback (most recent call...

why did you not use Naives bayes?

What project is this question for?

why did you not use Naives bayes?

Oh, I see. I used tree-based algorithms there because they're good default algorithms to use (--> not much need for data preprocessing), have good performance, and are easy to tune.

ML notebook: Add preprocessing and a sklearn pipeline

The example data science notebook: https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb

Add Callbacks for logging and/or performing custom operations after every generation

Looking forward to this in TPOT. I think it will clean up the interface quite a bit.

Add a built-in configuration dictionary for machine learning with text data

@teaearlgraycold, taking your example, it could work like this: ```python my_text = "...".split("\n") class_labels = [1, 0, 0, 1, 1, ..., 0, 0] # Assuming len(my_text) == len(class_labels) my_tpot =...

Add a built-in configuration dictionary for machine learning with text data

Is text classification such a fundamentally different problem type that it requires a new TPOT class? Once the text is converted to a bag-of-[words, ngrams, etc.] representation, we're working with...

Add a built-in configuration dictionary for machine learning with text data

This sklearn issue seems relevant to our conversations here: https://github.com/scikit-learn/scikit-learn/pull/9012 Maybe a better way to accomplish what we want in the sklearn Pipeline architecture.

Record the progress and auto disable the obviously slow candidate(never use in future trainning)

@weixuanfu2016, can you please confirm that this issue is addressed in the 0.7 release? IIRC models that failed to finish evaluating due to timeouts have their "timeout score" recorded in...

Extending TPOT to unsupervised clustering

I like this idea. I'd like to explore it after we have regression integrated into TPOT. We should explore additional metrics for scoring unsupervised results as well.