cattonum
cattonum copied to clipboard
Consider overlap with recipes
This package looks promising. I am a frequent user of the recipes package and I would love to see those two working together, in particular with regard to applying e.g. label encoding engineered from training data to test data.
One could provide his own step_
functions (either in the cattonum package or try them to get accepted into recipes). There is a vignette on how to define custom steps. I saw you linked the recipes package in the vignette - and there are quite a few packages concerned with this topic - so I felt it would be nice to make these tools working together. And recipes and cattonum seem to share a similar philosophy. Is that a priority for you now?
cc: @topepo
I think that it's a good idea and could help create the steps if you are interested. I have another recipes
add-on package called embed
that I'm still developing but has some similar techniques in it.
One thing that it would require is to have separate functions to estimate and apply the encodings. The top level functions here take train
and test
arguments but it looks like encode_from_lkp
could be used to apply them at any stage (if I read that right).
Thanks for the comments! I agree this is a great idea. I am aware of embed
and I listed it on my announcement but haven't added it to the cattonum
README yet. My package needs some major improving and is still very young and maturing but as of now encode_from_lkp
does apply the encodings for some of the available ones: see here until the end of the file, for example. First the lookup table is trained/estimated/populated based on the training data, then the training data is encoded, and finally, if a test dataset has been passed, it is encoded with the same lookup table. I could definitely do some work on step_*
or similar for whichever package we think would be the best place for such functionality, but I'd have to wait until July is over to do so, as work and my summer semester for school will keep me very busy until then.
No hurry; I'm swamped this month.
One thing that I had planned on doing for recipes
was feature hashing (which you have listed). Let me know when/if you start working on that; I'd like to help.
Sounds good! I'll keep you updated.