Robyn icon indicating copy to clipboard operation
Robyn copied to clipboard

Are spend variables that are mostly 0 (i.e. rare spends) okay to use in Robyn?

Open SeanRichterWalsh opened this issue 1 year ago • 12 comments

This is more of a question than an issue so apologies if this is not the place to ask. I am wondering what the best way to include a channel that has very sparse periods of spend (i.e., most weeks spend is zero but every so often there is a period of a few weeks where a campaign runs).

Is it okay to use as a paid media variable or would it be better represented as a dummy variable?

Thank you for the great package.

SeanRichterWalsh avatar Jun 08 '23 06:06 SeanRichterWalsh

Hi thanks. It's of course not ideal if a variable lacks variation. You'll probably end up having small to no effects. Still worth trying tho.

gufengzhou avatar Jun 14 '23 04:06 gufengzhou

Thanks @gufengzhou and I completely agree about the lack of variation not being ideal for regression modelling strategies. To my point about such a variable being potentially better represented by a dummy variable, would you agree or is it the same issue whether left as continuous or coded as a dummy? Cheers.

SeanRichterWalsh avatar Jun 15 '23 10:06 SeanRichterWalsh

@SeanRichterWalsh I would be interested to know if we can represent it by a dummy variable. I am also facing similar issue for one of the paid media in my model.

swapnilpatil022 avatar Jun 15 '23 12:06 swapnilpatil022

@swapnilpatil022 it seems like it could be reasonable to include it as a dummy variable if there is a qualitative difference between periods when the dummy = 1 compared to periods when dummy = 0. What I am unsure of is whether to include a dummy along with the spend variable.

SeanRichterWalsh avatar Jun 17 '23 04:06 SeanRichterWalsh

@SeanRichterWalsh Yeah, thats what the question is whether to include this dummy variable along with spend or do we include it as a part of context variable.

swapnilpatil022 avatar Jun 19 '23 05:06 swapnilpatil022

@gufengzhou any thoughts on above?

swapnilpatil022 avatar Jun 20 '23 04:06 swapnilpatil022

I doubt it's better to have it as dummy variable. What happens with dummy is that it gets one-hot-encoded, you'll end up having n-1 (n is nr of levels) variables with 0 and 1 that carries less information than a continuous variable.

gufengzhou avatar Jun 23 '23 07:06 gufengzhou

Thanks @gufengzhou. I have seen conflicting advice for when an independent variable contains mostly zeroes. Some people on Cross Validated have suggested using a dummy along with the numeric variable while others tend to align with your answer or suggest using just a dummy. Perhaps there is no right answer here and experimenting with the variable in question is the only way to figure out how to represent it. Cheers.

SeanRichterWalsh avatar Jun 23 '23 12:06 SeanRichterWalsh

@SeanRichterWalsh Out of curiosity, when you say ' using a dummy along with the numeric variable ' how are you suggesting this to use? Is it like in paid media spends we use one intermittent media spend variable and other is binary flag (1 & 0)?

swapnilpatil022 avatar Jun 26 '23 11:06 swapnilpatil022

@swapnilpatil022 Yes exactly. I have seen suggestions to use only a dummy but also a dummy and numeric. I have tried using a dummy only to indicate weeks of spend for a new channel with very little spend data. It gives a result that seems implausible as the contribution looks too high for the level of spend. I am exploring further.

SeanRichterWalsh avatar Sep 21 '23 20:09 SeanRichterWalsh

Hi @SeanRichterWalsh - do you have any further insight from exploring this further? We are finding that when we add sporadic spend, it is massively inflating the contribution also.

sineadflahive avatar Feb 20 '24 12:02 sineadflahive

Hi @sineadflahive - I included as a dummy variable in the end. The particular "channel" is really a tactic which runs for a few weeks a couple of times a year. Therefore, there is not a lot of variation which is not ideal. However, the effect is actually more realistic than originally thought so my opinion now is that including a low variation variable can be okay if the effect is expected to be large. Try running with and without and see how the model fit statistics vary too.

SeanRichterWalsh avatar Feb 20 '24 15:02 SeanRichterWalsh