Monorepo icon indicating copy to clipboard operation
Monorepo copied to clipboard

Feature/no answer pipeline

Open viktors264 opened this issue 3 years ago • 6 comments

Added new noAnswer key and updated generic pipeline aggregation to show all responses without answer.

viktors264 avatar Dec 29 '22 13:12 viktors264

This is a good start! But it's missing a key feature, which is that the no_answer key should be added to buckets, not just facets.

What I mean is that currently this gives up data like this (in this case, "years of experience" with the "gender" facet):

facets: [
        {
          type: 'gender',
          id: 'noAnswer',
          buckets: [
            { id: 'range_5_10', count: 66 },
            { id: 'range_10_20', count: 49 },
            { id: 'range_less_than_1', count: 20 },
          ]
        },
        {
          type: 'gender',
          id: 'not_listed',
          buckets: [
            { id: 'range_2_5', count: 35 },
            { id: 'range_5_10', count: 34 },
            { id: 'range_10_20', count: 39 },
          ]
        },
        {
          type: 'gender',
          id: 'male',
          buckets: [
            { id: 'range_2_5', count: 7970 },
            { id: 'range_10_20', count: 5470 },
            { id: 'range_5_10', count: 7362 },
          ]
        },

So you've added the "years of experience" breakdown for people who didn't answer the "gender" question.

But within each "years of experience" array of buckets, we also want to know how many people didn't answer the years of experience question. So the data we actually want for would be more like this:

facets: [
        {
          type: 'gender',
          id: 'noAnswer',
          buckets: [
            { id: 'range_5_10', count: 66 },
            { id: 'range_10_20', count: 49 },
            { id: 'range_less_than_1', count: 20 },
            { id: 'no_answer', count: 123 }, // people who didn't answer gender OR years of experience
          ]
        },
        {
          type: 'gender',
          id: 'not_listed',
          buckets: [
            { id: 'range_2_5', count: 35 },
            { id: 'range_5_10', count: 34 },
            { id: 'range_10_20', count: 39 },
            { id: 'no_answer', count: 123 }, // people who picked "not_listed" as gender but didn't answer "years of experience"
          ]
        },

Additionally we want this no_answer bucket to appear even when people don't select any facet. So we also want this:

"facets": [
              {
                "id": "default", // this is what we get when no facet is selected
                "buckets": [
                  {
                    "id": "range_less_than_1",
                    "count": 1272,
                  },
                  {
                    "id": "range_1_2",
                    "count": 4177,
                  },
                  {
                    "id": "range_2_5",
                    "count": 8710,
                  },
                  {
                    "id": "no_answer",
                    "count": 123,
                  },

SachaG avatar Jan 17 '23 23:01 SachaG

Screen Shot 2023-01-18 at 9 08 25

By the way, that no_answer bucket already appears in the survey results, but currently it's manually calculated in the chart itself (number of total respondents - sum of respondents in the other columns). I think it would be cleaner to do it at the API level.

(Also I guess it wouldn't be too hard to do it outside the aggregation pipeline in the rest of the JS code if the pipeline can't easily do it)

SachaG avatar Jan 18 '23 00:01 SachaG

Screen Shot 2023-01-18 at 9 08 25

By the way, that no_answer bucket already appears in the survey results, but currently it's manually calculated in the chart itself (number of total respondents - sum of respondents in the other columns). I think it would be cleaner to do it at the API level.

(Also I guess it wouldn't be too hard to do it outside the aggregation pipeline in the rest of the JS code if the pipeline can't easily do it)

Yes, of course - better to make calculations inside API.

viktors264 avatar Jan 18 '23 04:01 viktors264

Good progress! But now I'm running into a different issue. It doesn't work when querying for a field where people can pick multiple options at the same time.

For example with the following GraphQL query:

query raceEthnicityQuery {
    survey(survey: state_of_js) {
        demographics {
            race_ethnicity: race_ethnicity(filters: {}, options: {}) {
                keys
                year(year: 2022) {
                    year
                    completion {
                        total
                        percentage_survey
                        count
                    }
                    facets {
                        id
                        type
                        completion {
                            total
                            percentage_question
                            percentage_survey
                            count
                        }
                        buckets {
                            id
                            count
                            percentage_question
                            percentage_survey
                        }
                    }
                }
            }
            
        }
    }
}

I get this:

results: [
    {
      facets: [
        {
          type: 'default',
          id: 'default',
          buckets: [
            { id: [ 'multiracial', 'white_european' ], count: 33 },
            {
              id: [
                'black_african',
                'east_asian',
                'hispanic_latin',
                'middle_eastern',
                'multiracial',
                'native_american_islander_australian',
                'south_asian',
                'south_east_asian'
              ],
              count: 1
            },
            {
              id: [ 'multiracial', 'hispanic_latin', 'white_european' ],
              count: 2
            },
            {
              id: [ 'multiracial', 'white_european', 'middle_eastern' ],
              count: 2
            },
            { id: [ 'east_asian', 'multiracial' ], count: 1 },
            {
              id: [ 'south_east_asian', 'south_asian', 'east_asian' ],
              count: 3
            },
            {
              id: [
                'black_african',
                'east_asian',
                'hispanic_latin',
                'middle_eastern',
                'native_american_islander_australian',
                'multiracial',
                'south_asian',
                'south_east_asian',
                'white_european',
                'not_listed'
              ],
              count: 1
            },
            { id: [ 'south_east_asian' ], count: 1000 },
            { id: [ 'multiracial', 'south_east_asian' ], count: 1 },
            {
              id: [
                'east_asian',
                'native_american_islander_australian',
                'south_asian',
                'white_european'
              ],
              count: 1
            },
etc.

As you can see it's using every existing combination of answers as a unique id key instead of aggregating them. The correct output (from main branch) would be:

  results: [
    {
      facets: [
        {
          type: 'default',
          id: 'default',
          buckets: [
            { id: 'multiracial', count: 727 },
            { id: 'east_asian', count: 1710 },
            { id: 'white_european', count: 19790 },
            { id: 'middle_eastern', count: 1158 },
            { id: 'hispanic_latin', count: 2795 },
            { id: 'south_asian', count: 1731 },
            { id: 'native_american_islander_australian', count: 142 },
            { id: 'not_listed', count: 795 },
            { id: 'south_east_asian', count: 1221 },
            { id: 'black_african', count: 1074 }
          ]
        }
      ],
      year: 2022
    }
  ]
}

SachaG avatar Jan 18 '23 05:01 SachaG

Someone is attempting to deploy a commit to the Devographics Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Jan 18 '23 16:01 vercel[bot]

Good progress! But now I'm running into a different issue. It doesn't work when querying for a field where people can pick multiple options at the same time.

I have added back unwind operator with specific option which not skip nullable/empty fields. Seems, that we cannot remove unwind operator. Tested your case, working fine now, tested previous cases locally also - seems working for me. For me difficult to know and test all cases, but let me know if something is wrong.

viktors264 avatar Jan 18 '23 16:01 viktors264