data icon indicating copy to clipboard operation
data copied to clipboard

Is there a description for the "category" field in IMDB ratings of Al Gore’s movie?

Open OmaymaS opened this issue 7 years ago • 0 comments

I opened Issue 35 in the fivethirtyeight R package repo asking for a clarification about the category variable in the ratings dataset; the one used for "Al Gore's New Movie Exposes The Big Flaw In Online Movie Ratings". @rudeboybert advised to check here.

The available categories are as follows:

> library(fivethirtyeight)
> levels(ratings$category)
 [1] "Aged 18-29"         "Aged 30-44"         "Aged 45+"           "Aged under 18"      "Females"           
 [6] "Females Aged 18-29" "Females Aged 30-44" "Females Aged 45+"   "Females under 18"   "IMDb staff"        
[11] "IMDb users"         "Males"              "Males Aged 18-29"   "Males Aged 30-44"   "Males Aged 45+"    
[16] "Males under 18"     "Non-US users"       "Top 1000 voters"    "US users"    

However, it is not clear:

  • Is the Males under 18 a subset of all Males, and if not, how do the categories differ?
  • Is there any intersection between the categories?
  • If the number of respondents in 'Females Aged 18-29'+'Females Aged 30-44'+'Females Aged 45+'+'Females under 18' is less that the number of respondents in the Female category. Is the gap due to respondents with unknown age?

OmaymaS avatar Oct 31 '18 23:10 OmaymaS