Robyn icon indicating copy to clipboard operation
Robyn copied to clipboard

Unfortunate channel naming can lead to mixed up hyperparameters in budget allocator

Open m4x3 opened this issue 1 year ago • 2 comments

Issue

When running the budget allocator on our internal data, the gamma / inflexions parameters of some channels got mixed up due to unfortunate variable naming and implicit sorting logic in the code.

Internally, one of our media channels is called fb_value_opt (FB value opt campaign) and another is called fb_value_opt_ads (FB value opt campaign that also optimizes for ad revenue).

In the budget allocator the hill parameters are fetched from the model results here.

The get_hill_params() function is defined here.

In this function is a part where the inflexion points are calculated. The following code assumes that the chnAdstocked columns are sorted identically as the gammas vector.

inflexions <- unlist(lapply(seq(ncol(chnAdstocked)), function(i) {
    c(range(chnAdstocked[, i]) %*% c(1 - gammas[i], gammas[i]))
  }))

However, as the below example shows, this is unfortunately not the case if the variables are named as above. In our case, this meant that inflexion points for these campaigns were calculated wrong which had a very drastic impact on the budget allocator. No error was raised, we only identified this issue because we independently ran the robyn_response on these channels and got back different results.

Provide reproducible example

sort(c("fb_value_opt", "fb_value_opt_ads"))
[1] "fb_value_opt"     "fb_value_opt_ads"
sort(c("fb_value_opt_gammas", "fb_value_opt_ads_gammas"))
[1] "fb_value_opt_ads_gammas" "fb_value_opt_gammas"    

Potential fix

In our case, we fixed the issue by changing the sorting of the gammas vector inside the get_hill_params() function. We also changed the sorting of the alphas vector to be safe.:

...
names(gammas) <- stringr::str_remove(names(gammas),"_gammas")
gammas <- gammas[names(chnAdstocked)]
names(gammas) <- paste0(names(gammas), "_gammas")

names(alphas) <- stringr::str_remove(names(alphas),"_alphas")
alphas <- alphas[names(chnAdstocked)]
names(alphas) <- paste0(names(alphas), "_alphas")

inflexions <- unlist(lapply(seq(ncol(chnAdstocked)), function(i) {
    c(range(chnAdstocked[, i]) %*% c(1 - gammas[i], gammas[i]))
  }))
...

There may be more elegant solutions to this.

I know this may rather be an edge case problem due to our channel naming, but it had quite a drastic impact on our results.

m4x3 avatar Sep 18 '23 09:09 m4x3

thanks very much for raising this! we didnt consider this case and will include a fix soon. Have you observed similar sorting issues for the modelling itself? If you run the model with "standard naming" and same setting, do you get the same results? I'm trying to get a sense if I need to check more places regarding sorting.

gufengzhou avatar Sep 20 '23 07:09 gufengzhou

Hi! Thanks for looking into this. I now ran different iterations to check on the consistency of the results:

  1. Running the same model with "standard naming", i.e. no varname is a substring of another varname.
  2. Running the same model with "unfortunate naming", i.e. some varnames are substrings of other varnames.
  3. Running the same model with "standard naming" but with minor changes to the initial varnames.

Version 1 and 2 produce different model results. This indicates that the sorting issue also occurs somewhere in the modeling and/or output generation process.

Version 1 and 3 produce identical results. This was expected, but I wanted to rule out that changing names, in general, can produce different results.

For now, I will continue using standard names to rule out any issues for my project.

m4x3 avatar Sep 21 '23 08:09 m4x3