Lowercase locations in GeoLiftMultiCell

Open lukasvermeer opened this issue 3 years ago • 1 comments

So they are excluded as controls even if supplied as uppercase.

Jan 17 '23 20:01 lukasvermeer

Consider the following script (based on the documentation).

install.packages("remotes", repos='http://cran.us.r-project.org')
remotes::install_github("ebenmichael/augsynth")
remotes::install_github("facebookincubator/GeoLift")

library(GeoLift)
data(GeoLift_Test)

GeoTestData_Test <- GeoDataRead(data = GeoLift_Test,
                                date_id = "date",
                                location_id = "location",
                                Y_id = "Y",
                                X = c(), #empty list as we have no covariates
                                format = "yyyy-mm-dd",
                                summary = TRUE)

# First we specify our test locations as a list
test_locations <- list(cell_1 = list("chicago", "cincinnati"),
                       cell_2 = list("honolulu", "indianapolis"))

# Same test locations, but UPPER CASE
test_locations_upper <- list(cell_1 = list("CHICAGO", "CINCINNATI"),
                       cell_2 = list("HONOLULU", "INDIANAPOLIS"))

# Then, we run MultiCellResults
MultiCellResults <- GeoLiftMultiCell(data = GeoTestData_Test,
                                     locations = test_locations,
                                     treatment_start_time = 91,
                                     treatment_end_time = 105,
                                     alpha = 0.1,
                                     model = "best",
                                     fixed_effects = TRUE,
                                     ConfidenceIntervals = TRUE,
                                     method = "conformal",
                                     stat_test = "Positive",
                                     winner_declaration = TRUE)

# And the same for UPPER CASE locations
MultiCellResults_upper <- GeoLiftMultiCell(data = GeoTestData_Test,
                                     locations = test_locations_upper,
                                     treatment_start_time = 91,
                                     treatment_end_time = 105,
                                     alpha = 0.1,
                                     model = "best",
                                     fixed_effects = TRUE,
                                     ConfidenceIntervals = TRUE,
                                     method = "conformal",
                                     stat_test = "Positive",
                                     winner_declaration = TRUE)

# Results should be the same, but they are not

summary(MultiCellResults, table = TRUE)
summary(MultiCellResults_upper, table = TRUE)

We would expect results to be identical regardless of UPPER CASE locations, but they are not.

> summary(MultiCellResults, table = TRUE)
| Cell|Location               | Duration|Lift  | Incremental|      ATT| pValue|Stat_Test                    | Stat_Sig|Prognostic_Func |Winner |
|----:|:----------------------|--------:|:-----|-----------:|--------:|------:|:----------------------------|--------:|:---------------|:------|
|    1|CHICAGO, CINCINNATI    |       15|-4.5% |       -4043| -134.780|   0.86|ONE-SIDED POSITIVE LIFT TEST |        0|RIDGE           |       |
|    2|HONOLULU, INDIANAPOLIS |       15|-10%  |       -9234| -307.789|   0.56|ONE-SIDED POSITIVE LIFT TEST |        0|RIDGE           |       |
> summary(MultiCellResults_upper, table = TRUE)
| Cell|Location               | Duration|Lift  | Incremental|      ATT| pValue|Stat_Test                    | Stat_Sig|Prognostic_Func |Winner |
|----:|:----------------------|--------:|:-----|-----------:|--------:|------:|:----------------------------|--------:|:---------------|:------|
|    1|CHICAGO, CINCINNATI    |       15|-4.5% |       -4043| -134.764|   0.85|ONE-SIDED POSITIVE LIFT TEST |        0|RIDGE           |       |
|    2|HONOLULU, INDIANAPOLIS |       15|-10%  |       -9187| -306.226|   0.54|ONE-SIDED POSITIVE LIFT TEST |        0|RIDGE           |       |

This is because treatment markets were included as controls for other cells.

> summary(MultiCellResults_upper)
##################################
#####     Cell 1 Results    #####
##################################

GeoLift Results Summary

##################################
#####     Test Statistics    #####
##################################

* Average ATT: -134.764
* Percent Lift: -4.5%
* Incremental Y: -4043
* P-value: 0.85
* 90% Confidence Interval: (-7119.606, 26601.002)

##################################
#####   Balance Statistics   #####
##################################

* L2 Imbalance: 956.783
* Scaled L2 Imbalance: 0.1755
* Percent improvement from naive model: 82.45%
* Average Estimated Bias: -0.425

##################################
#####     Model Weights      #####
##################################

* Prognostic Function: RIDGE

* Model Weights:
 * portland: 0.2121
 * austin: 0.152
 * nashville: 0.1463
 * san diego: 0.1371
 * minneapolis: 0.1364
 * new york: 0.06
 * baton rouge: 0.0557
 * reno: 0.0474
 * miami: 0.0359
 * atlanta: 0.0117
 * houston: 0.0044
 * san antonio: 0.0041
 * salt lake city: -0.0012
 * oakland: -7e-04
 * philadelphia: -7e-04
 * oklahoma city: -4e-04
 * baltimore: 2e-04
 * las vegas: 2e-04
 * dallas: 2e-04
 * new orleans: -1e-04
 * san francisco: -1e-04
 * boston: -1e-04
 * washington: -1e-04
 * columbus: 1e-04
 * kansas city: -1e-04
 * phoenix: -1e-04
 * honolulu: 1e-04
 * milwaukee: -1e-04
 * memphis: -1e-04
 * cleveland: -1e-04
 * saint paul: 1e-04



##################################
#####     Cell 2 Results    #####
##################################

GeoLift Results Summary

##################################
#####     Test Statistics    #####
##################################

* Average ATT: -306.226
* Percent Lift: -10%
* Incremental Y: -9187
* P-value: 0.54
* 90% Confidence Interval: (-6943.434, 102532.07)

##################################
#####   Balance Statistics   #####
##################################

* L2 Imbalance: 1510.942
* Scaled L2 Imbalance: 0.3445
* Percent improvement from naive model: 65.55%
* Average Estimated Bias: 12.097

##################################
#####     Model Weights      #####
##################################

* Prognostic Function: RIDGE

* Model Weights:
 * austin: 0.3544
 * tucson: 0.235
 * portland: 0.1723
 * nashville: 0.0979
 * baton rouge: 0.0808
 * detroit: 0.0448
 * orlando: 0.039
 * phoenix: -0.0133
 * columbus: -0.0126
 * salt lake city: -0.0123
 * memphis: -0.0103
 * oklahoma city: 0.0099
 * miami: -0.0091
 * houston: 0.0071
 * cincinnati: 0.007
 * las vegas: -0.0068
 * baltimore: -0.0064
 * san francisco: -0.0059
 * jacksonville: 0.0058
 * kansas city: 0.0052
 * milwaukee: 0.0042
 * cleveland: 0.0041
 * philadelphia: 0.004
 * denver: -0.0031
 * new orleans: -0.0028
 * los angeles: 0.0023
 * washington: 0.0022
 * minneapolis: 0.0019
 * san diego: 0.0016
 * dallas: 0.0016
 * san antonio: 0.0014
 * reno: 0.0013
 * atlanta: -0.0013
 * boston: 0.0012
 * new york: -9e-04
 * chicago: -9e-04
 * oakland: 6e-04
 * saint paul: -1e-04

This is incorrect. This patch fixes that.

Jan 17 '23 20:01 lukasvermeer