cattonum icon indicating copy to clipboard operation
cattonum copied to clipboard

Consider overlap with recipes

Open lorenzwalthert opened this issue 5 years ago • 4 comments

This package looks promising. I am a frequent user of the recipes package and I would love to see those two working together, in particular with regard to applying e.g. label encoding engineered from training data to test data. One could provide his own step_ functions (either in the cattonum package or try them to get accepted into recipes). There is a vignette on how to define custom steps. I saw you linked the recipes package in the vignette - and there are quite a few packages concerned with this topic - so I felt it would be nice to make these tools working together. And recipes and cattonum seem to share a similar philosophy. Is that a priority for you now? cc: @topepo

lorenzwalthert avatar Jul 04 '18 09:07 lorenzwalthert

I think that it's a good idea and could help create the steps if you are interested. I have another recipes add-on package called embed that I'm still developing but has some similar techniques in it.

One thing that it would require is to have separate functions to estimate and apply the encodings. The top level functions here take train and test arguments but it looks like encode_from_lkp could be used to apply them at any stage (if I read that right).

topepo avatar Jul 04 '18 16:07 topepo

Thanks for the comments! I agree this is a great idea. I am aware of embed and I listed it on my announcement but haven't added it to the cattonum README yet. My package needs some major improving and is still very young and maturing but as of now encode_from_lkp does apply the encodings for some of the available ones: see here until the end of the file, for example. First the lookup table is trained/estimated/populated based on the training data, then the training data is encoded, and finally, if a test dataset has been passed, it is encoded with the same lookup table. I could definitely do some work on step_* or similar for whichever package we think would be the best place for such functionality, but I'd have to wait until July is over to do so, as work and my summer semester for school will keep me very busy until then.

bfgray3 avatar Jul 05 '18 03:07 bfgray3

No hurry; I'm swamped this month.

One thing that I had planned on doing for recipes was feature hashing (which you have listed). Let me know when/if you start working on that; I'd like to help.

topepo avatar Jul 05 '18 14:07 topepo

Sounds good! I'll keep you updated.

bfgray3 avatar Jul 06 '18 00:07 bfgray3