SPORF icon indicating copy to clipboard operation
SPORF copied to clipboard

use R-format for classifier specification and streamline IO for prediction

Open ebridge2 opened this issue 6 years ago • 9 comments

wrap the forest, X when rank transforming, etc. anything that predict would need for determining prediction to simplify your function calls as a single class object. Then, use the S3 predict function instead of Predict that you guys wrote. That is considered "standard practice" for ML techniques in R; big packages like caret, lmer, etc. use it. Ie:

https://github.com/neurodata/lol/blob/master/R/random_classifier.R

This classifier an be used as:

model <- lol.classify.randomGuess(X[train,], Y[train])  # model has everything needed for prediction already
prediction <- predict(model, X[-train,])  # use S3 predict method for consistency with other classifiers in R language; calls predict.randomGuess from the above file

For example, if you have rank.transform = TRUE as specified in your docs, there's no need to require users to have to know to specify their X when predicting; you can output a "RerF" class object that either contains or doesn't contain X depending on when rank.transform = TRUE, and then just wrap this in whatever is output by RerF and pass that whole thing to predict.

I'll be doing some R-RerF stuff in the future, so I've assigned this to myself for now as it seems like a decent initial goal

ebridge2 avatar Apr 23 '18 18:04 ebridge2

@ebridge2 they wrote the code to be like the randomForest package, i'm not sure if that is relevant to your comment.

jovo avatar Jun 03 '18 14:06 jovo

randomForest (I use that package and based my code for writing/training classifiers around their interface) uses predict.* S3 interface. see randomForest predict code. In randomForest, you can see there's a function called "predict.randomForest()" that takes an object as its first parameter. this is how you extend stats::predict, since predict() will know to use the predict.name associated with classifier name so long as it exists. When you predict with randomForest, you would just call predict(trained_model, X.new), NOT predict.randomForest(trained_model, X.new) because that's how S3 works in R. In R-Rerf, we write our own non-S3 Predict() function that does not extend the S3 method predict() in stats like theirs does.

As for comment about passing a variable for rank transform, that is just stylistic; that parameter is passed when training the forst so there's no reason to have to pass it twice when you can just return it as part of the trained model object (ie, in the trained output, set trained$rank=TRUE and then do a simple check in the predict() method on the object) and then pass the whole model object to predict().

ebridge2 avatar Jun 03 '18 15:06 ebridge2

the advantage is that let's say I train a bunch of models using a bunch of packages (randomForest, glm, R-RerF) I can use the same predict(trained_model, X.new) for all of them if you extend the S3 method properly.

ebridge2 avatar Jun 03 '18 15:06 ebridge2

This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.

jbrowne6 avatar Jun 04 '18 14:06 jbrowne6

Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.

On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < [email protected]> wrote:

This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394365692, or mute the thread https://github.com/notifications/unsubscribe-auth/AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM .

tyler-tomita avatar Jun 05 '18 20:06 tyler-tomita

Why?

On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita [email protected] wrote:

Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.

On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < [email protected]> wrote:

This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394365692, or mute the thread < https://github.com/notifications/unsubscribe-auth/AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394846691, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM .

jovo avatar Jun 05 '18 22:06 jovo

Why integrate with caret? It makes model tuning much easier. It provides a uniform interface for comparing many different algorithms. Most of the well-known algorithms, including XGBoost and Rborist have been integrated. Therefore, it will increase both visibility and accessibility to users.

Looking at caret's repo, it actually doesn't seem like there is a requirement to conform to standard s3 methods.

On Tue, Jun 5, 2018 at 6:54 PM joshua vogelstein [email protected] wrote:

Why?

On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita [email protected] wrote:

Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.

On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < [email protected]> wrote:

This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment-394365692 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394846691, or mute the thread < https://github.com/notifications/unsubscribe-auth/AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394885863, or mute the thread https://github.com/notifications/unsubscribe-auth/AJVzkUDgkDi2WXaZHDUcbOzp5I9rw-Hyks5t5wwTgaJpZM4TgXAM .

tyler-tomita avatar Jun 06 '18 01:06 tyler-tomita

definitely no requirement, but it will make peoples' lives easier the more accessible the algorithm is :) S3 methods are super trivial to implement; it would probably be sub 4 lines to change from your current Predict() as far as I can tell. Just add @importFrom stats predict in the roxygen, rename your Predict() to predict.RandomerForest(object, X, additional_hyperparams, ...) (or whatever R-RerF training is called; here I've assumed RandomerForest), wrap the entire trained classifier (and all required hyperparameters related to training) in a single list (passed into predict() as the object argument and make it a structure(list(training_param1=#, training_param2=#, etc), class="RandomerForest"), and you are done (and prediction can be done as predict(trained_classifier, new_data, additional_args).

On Tue, Jun 5, 2018 at 9:27 PM, Tyler Tomita [email protected] wrote:

Why integrate with caret? It makes model tuning much easier. It provides a uniform interface for comparing many different algorithms. Most of the well-known algorithms, including XGBoost and Rborist have been integrated. Therefore, it will increase both visibility and accessibility to users.

Looking at caret's repo, it actually doesn't seem like there is a requirement to conform to standard s3 methods.

On Tue, Jun 5, 2018 at 6:54 PM joshua vogelstein <[email protected]

wrote:

Why?

On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita [email protected] wrote:

Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.

On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < [email protected]> wrote:

This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment- 394365692 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment-394846691 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394885863, or mute the thread <https://github.com/notifications/unsubscribe-auth/ AJVzkUDgkDi2WXaZHDUcbOzp5I9rw-Hyks5t5wwTgaJpZM4TgXAM>

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394910574, or mute the thread https://github.com/notifications/unsubscribe-auth/AIeNW8UGi1qZ9b857VfF4_9TaqU0qDhVks5t5zALgaJpZM4TgXAM .

-- Eric Bridgeford ericwb.me

ebridge2 avatar Jun 06 '18 01:06 ebridge2

i see. being caret compliant seems good, as does the S3 stuff. seems like something @ttomita or @ebridge can easily do at some point.

On Tue, Jun 5, 2018 at 9:56 PM, Eric Bridgeford [email protected] wrote:

definitely no requirement, but it will make peoples' lives easier the more accessible the algorithm is :) S3 methods are super trivial to implement; it would probably be sub 4 lines to change from your current Predict() as far as I can tell. Just add @importFrom stats predict in the roxygen, rename your Predict() to predict.RandomerForest(object, X, additional_hyperparams, ...) (or whatever R-RerF training is called; here I've assumed RandomerForest), wrap the entire trained classifier (and all required hyperparameters related to training) in a single list (passed into predict() as the object argument and make it a structure(list(training_param1=#, training_param2=#, etc), class="RandomerForest"), and you are done (and prediction can be done as predict(trained_classifier, new_data, additional_args).

On Tue, Jun 5, 2018 at 9:27 PM, Tyler Tomita [email protected] wrote:

Why integrate with caret? It makes model tuning much easier. It provides a uniform interface for comparing many different algorithms. Most of the well-known algorithms, including XGBoost and Rborist have been integrated. Therefore, it will increase both visibility and accessibility to users.

Looking at caret's repo, it actually doesn't seem like there is a requirement to conform to standard s3 methods.

On Tue, Jun 5, 2018 at 6:54 PM joshua vogelstein < [email protected]

wrote:

Why?

On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita [email protected] wrote:

Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.

On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < [email protected]> wrote:

This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment- 394365692 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment- 394846691 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment-394885863 , or mute the thread <https://github.com/notifications/unsubscribe-auth/ AJVzkUDgkDi2WXaZHDUcbOzp5I9rw-Hyks5t5wwTgaJpZM4TgXAM>

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394910574, or mute the thread <https://github.com/notifications/unsubscribe- auth/AIeNW8UGi1qZ9b857VfF4_9TaqU0qDhVks5t5zALgaJpZM4TgXAM> .

-- Eric Bridgeford ericwb.me

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394914883, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcjxjVhABJgNeKpLALADSLyQk9Lyoks5t5zbPgaJpZM4TgXAM .

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding.

jovo avatar Jun 06 '18 02:06 jovo