Robyn
Robyn copied to clipboard
Are spend variables that are mostly 0 (i.e. rare spends) okay to use in Robyn?
This is more of a question than an issue so apologies if this is not the place to ask. I am wondering what the best way to include a channel that has very sparse periods of spend (i.e., most weeks spend is zero but every so often there is a period of a few weeks where a campaign runs).
Is it okay to use as a paid media variable or would it be better represented as a dummy variable?
Thank you for the great package.
Hi thanks. It's of course not ideal if a variable lacks variation. You'll probably end up having small to no effects. Still worth trying tho.
Thanks @gufengzhou and I completely agree about the lack of variation not being ideal for regression modelling strategies. To my point about such a variable being potentially better represented by a dummy variable, would you agree or is it the same issue whether left as continuous or coded as a dummy? Cheers.
@SeanRichterWalsh I would be interested to know if we can represent it by a dummy variable. I am also facing similar issue for one of the paid media in my model.
@swapnilpatil022 it seems like it could be reasonable to include it as a dummy variable if there is a qualitative difference between periods when the dummy = 1 compared to periods when dummy = 0. What I am unsure of is whether to include a dummy along with the spend variable.
@SeanRichterWalsh Yeah, thats what the question is whether to include this dummy variable along with spend or do we include it as a part of context variable.
@gufengzhou any thoughts on above?
I doubt it's better to have it as dummy variable. What happens with dummy is that it gets one-hot-encoded, you'll end up having n-1 (n is nr of levels) variables with 0 and 1 that carries less information than a continuous variable.
Thanks @gufengzhou. I have seen conflicting advice for when an independent variable contains mostly zeroes. Some people on Cross Validated have suggested using a dummy along with the numeric variable while others tend to align with your answer or suggest using just a dummy. Perhaps there is no right answer here and experimenting with the variable in question is the only way to figure out how to represent it. Cheers.
@SeanRichterWalsh Out of curiosity, when you say ' using a dummy along with the numeric variable ' how are you suggesting this to use? Is it like in paid media spends we use one intermittent media spend variable and other is binary flag (1 & 0)?
@swapnilpatil022 Yes exactly. I have seen suggestions to use only a dummy but also a dummy and numeric. I have tried using a dummy only to indicate weeks of spend for a new channel with very little spend data. It gives a result that seems implausible as the contribution looks too high for the level of spend. I am exploring further.
Hi @SeanRichterWalsh - do you have any further insight from exploring this further? We are finding that when we add sporadic spend, it is massively inflating the contribution also.
Hi @sineadflahive - I included as a dummy variable in the end. The particular "channel" is really a tactic which runs for a few weeks a couple of times a year. Therefore, there is not a lot of variation which is not ideal. However, the effect is actually more realistic than originally thought so my opinion now is that including a low variation variable can be okay if the effect is expected to be large. Try running with and without and see how the model fit statistics vary too.