vaken icon indicating copy to clipboard operation
vaken copied to clipboard

Race/ethnicity enum doesn't match Census standards

Open bencooper222 opened this issue 6 years ago • 6 comments

From the Census Bureau:

The U.S. Census Bureau considers race and ethnicity to be two separate and distinct concepts (source)

Race/ethnicity are messy concepts and I'm not arguing the Census' classification is perfect. However, us deviating from their methodology injures our ability to make comparisons to population level statistics.

The proper way to do this is to just mimic the Census, but with less fidelity (see below for what the Census does). I don't think we need to worry about a Native American's tribe, the specific Asian country of origin or the specific Hispanic origin so we can just do a boolean for Hispanic status and a "check all that apply" for race.

image

https://github.com/VandyHacks/vaken/blob/039f3dc77374432aa272559fb24977bf8920ffb5/src/common/schema.graphql.ts#L36

bencooper222 avatar Jul 10 '19 18:07 bencooper222

I don't see the benefit of this. I think the existing implementation is fine, except we should also have an other field. Most hackathons only use this data to roughly estimate diversity, which the current categorization is sufficient for imho.

cktang88 avatar Jul 10 '19 18:07 cktang88

We save like 20 lines of code (we're talking about adding a checkbox and a field) and, in exchange, we get less accuracy with our estimates. More worryingly, our estimates become completely invalid in the formal sense because we can't quantify error at all. We're not running a study or something where that would be completely disqualifying but less accuracy to avoid basically no extra code seems like a bad choice.

bencooper222 avatar Jul 10 '19 18:07 bencooper222

I don't get what is inaccurate about our current categorization. For example, the demographic stats we got from VH5 seems accurate and perfectly fine.

cktang88 avatar Jul 10 '19 19:07 cktang88

I can get us some hard numbers on which groups we'd have trouble capturing later. That said, I think we'd struggle to identify bad data if we continue with the approach of past years. How do you identify someone forced to misclassify?

bencooper222 avatar Jul 10 '19 22:07 bencooper222

Well the other option would enable us to identify those that were previously misclassified.

cktang88 avatar Jul 10 '19 23:07 cktang88

Our numbers aren’t valuable by themselves - they’re valuable by comparison to others. That’s why I think we should stick to the same format that everyone else uses.

bencooper222 avatar Jul 11 '19 01:07 bencooper222