surveys icon indicating copy to clipboard operation
surveys copied to clipboard

Missing results in 2023 CNCF survey

Open craigbox opened this issue 1 year ago • 0 comments

It would appear that some options were only added to last year's CNCF survey after about one-third of the results had been submitted. By my guess at the problem, recalculating percentages of adoption, those projects are under-reported by as much as 20%, and ranked incorrectly in the published graphs by up to 8 places.

Questions 30 and 31 of the recently published CNCF survey speak to the usage of various graduated and incubating projects.

Important notes for analysis of Q30 (CNCF graduated projects) and Q31 (CNCF incubating projects). The data for these two matrix questions has been restated in this dataset so that it is easier for you to analyze. For each of the projects listed, the data for each respondent is rendered in one of three ways: • Null (empty cell) = The respondent is not using or evaluating the project • 1 = The respondent is using the project in production • 2 = The respondent is evaluating the project

There are long strings of empty results from survey version 1, sequence number 196 to 545. These relate to the following projects:

  • Istio
  • cert-manager
  • Cloud Custodian
  • Keycloak
  • Kubeflow
  • KubeVela
  • Kyverno
  • OpenKruise

Given that

  • (a) these rows relate to the earliest dates of the survey, between 2023-09-06 and 2023-09-30
  • (b) the projects all moved level between 2022-06-30 (the start of the 2022 survey) and 2023-09-06 (the start of the 2023 survey)

I conclude that the survey launched without these projects included as options for Q30 and Q31, and they were added when someone noticed (around 2023-09-30).

With '988 records used for analysis', that means these projects are missing over 1/3 of their possible results. This skews the data in favour of projects that do not have this error. These results are not just missing in the published spreadsheet: the marketing report is based on the incorrect data.

For example, simply removing results 196 to 545 for all projects and re-running the numbers, the cert-manager project moves from 25.61% production usage to 39.66%, or, if summing production and evaluation, from 39.17% to 60.66%.

Other discrepancies I noticed:

  • A similar but smaller error appears to have been made for the first 18 results regarding graduated projects from Rook onwards.
  • There are large runs of "no answer" which seem statistically unlikely, such as for Notary in sequence 244-280.

craigbox avatar May 06 '24 03:05 craigbox