fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Website HTTP 5XX /api/v1/webhooks/receive-usage-analytics

Open rfairburn opened this issue 9 months ago • 8 comments

Fleet version: N/A

Web browser and operating system: N/A


💥  Actual behavior

May 09 07:00:04 error: Sending 500 ("Server Error") response: 
May 09 07:00:04  UsageError: Invalid new record.
May 09 07:00:04 Details:
May 09 07:00:04   Could not use specified `organization`.  Cannot set "" (empty string) for a required attribute.
May 09 07:00:04     at Object.fn (/app/api/controllers/webhooks/receive-usage-analytics.js:47:35)
May 09 07:00:04     at wrapper (/app/node_modules/@sailshq/lodash/lib/index.js:3305:19)
May 09 07:00:04     at Deferred.parley.retry [as _handleExec] (/app/node_modules/machine/lib/private/help-build-machine.js:1014:29)
May 09 07:00:04     at Deferred.exec (/app/node_modules/parley/lib/private/Deferred.js:286:10)
May 09 07:00:04     at Deferred.switch (/app/node_modules/machine/lib/private/help-build-machine.js:1469:16)
May 09 07:00:04     at Object._requestHandler [as webhooks/receive-usage-analytics] (/app/node_modules/machine-as-action/lib/machine-as-action.js:1153:27)
May 09 07:00:04     at /app/node_modules/sails/lib/router/bind.js:248:46
May 09 07:00:04     at routeTargetFnWrapper (/app/node_modules/sails/lib/router/bind.js:395:9)
May 09 07:00:04     at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04     at next (/app/node_modules/express/lib/router/route.js:149:13)
May 09 07:00:04     at Route.dispatch (/app/node_modules/express/lib/router/route.js:119:3)
May 09 07:00:04     at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04     at /app/node_modules/express/lib/router/index.js:284:15
May 09 07:00:04     at Function.process_params (/app/node_modules/express/lib/router/index.js:346:12)
May 09 07:00:04     at next (/app/node_modules/express/lib/router/index.js:280:10)
May 09 07:00:04     at next (/app/node_modules/express/lib/router/route.js:141:14)
May 09 07:00:04     at alwaysAllow (/app/node_modules/sails/lib/hooks/policies/index.js:178:16)
May 09 07:00:04     at routeTargetFnWrapper (/app/node_modules/sails/lib/router/bind.js:395:9)
May 09 07:00:04     at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04     at next (/app/node_modules/express/lib/router/route.js:149:13)
May 09 07:00:04     at Route.dispatch (/app/node_modules/express/lib/router/route.js:119:3)
May 09 07:00:04     at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04 <- POST /api/v1/webhooks/receive-usage-analytics  (3ms 500)
May 09 07:00:04  |  error

🧑‍💻  Steps to reproduce

Could not use specified `organization`.  Cannot set "" (empty string) for a required attribute.

Looks like report in with an empty organization.

🕯️ More info (optional)

N/A

🛠️ To fix

  • [ ] On the website: Update the receive-usage-analytics webhook to return a 400: bad request response to Fleet instances that report an unreleased version of Fleet.
  • [ ] Update Fleet to not send usage statistics if the server is running an unreleased version of Fleet

rfairburn avatar May 09 '24 13:05 rfairburn

Root cause: A Fleet instance sent usage statistics that reported an empty string as the organization. Since this value was defined, it was not replaced with the default value (unknown). When the webhook attempted to create a HistoricalUsageSnapshot record in the database, it threw an error because organization is a required value.

AFAIK, organization is only reported by instances with a Fleet Premium license, and that value is set is from the organization value of the license key.

Impact: One Fleet instance failed to report usage statistics three times.

Resolution: I merged #18864 to prevent this 500 error, but now I think that was the wrong way to handle this. This error is likely a symptom of something wrong with the instance that tried to report statistics, and the changes in that PR don't address the root issue.

  • [ ] TODO: Make a PR to the receive-from-usage-analytics webhook to log a warning with the anonymous ID of the instance when this happens.

eashaw avatar May 09 '24 18:05 eashaw

@lukeheath What do you think we should do about errors like these? I looked at the usage statistics that were causing this error, and was from a Fleet Premium instance that reported fleetd-chrome-v1.3.0-157-g80bce47c37-dirty as its version.

eashaw avatar May 09 '24 22:05 eashaw

@eashaw On the website endpoint, we should do some validation on what we'll accept as a version and if the parameter's value is invalid, we should return a 400 Bad Request.

@sharon-fdm On the Fleet server side, we should also validate what we're sending to the usage analytics endpoint. Based on the version, it appears a chrome extension was trying to send usage analytics.

I am adding the #g-endpoint-ops label and keeping #g-digital-experience so that you can both action the same bug.

lukeheath avatar May 10 '24 22:05 lukeheath

Thanks @lukeheath.

@getvictor does it make sense that our Chrome ext. sent statistics to the usage analytics endpoint? (I can't recall a reason for that) (fleetd-chrome-v1.3.0-157-g80bce47c37-dirty)

sharon-fdm avatar May 13 '24 15:05 sharon-fdm

@sharon-fdm @eashaw fleetd-chrome-v1.3.0-157-g80bce47c37-dirty is the Fleet server version. The reason it has fleetd-chrome in the name is because it was built from the commit that has the fleetd-chrome-v1.3.0 tag.

It seems like a development build.

@lukeheath Another possibility is that this issue has something to do with the tf-mod-addon-monitoring-v1.4.0 tag that was also put on that same commit.

getvictor avatar May 13 '24 15:05 getvictor

@sharon-fdm Should we consider validating the Fleet server version before sending usage analytics? That way, we're not skewing the data with fake usage analytics.

lukeheath avatar May 13 '24 16:05 lukeheath

@lukeheath, not a bad idea. I'll create a eng-intiated ticket.

sharon-fdm avatar May 13 '24 16:05 sharon-fdm

@sharon-fdm Could we do it as part of this bug ticket and avoid the new issue?

lukeheath avatar May 13 '24 16:05 lukeheath

I'm closing this issue. After discussing this with Sharon, we decided we do not need to validate the usage statistics sent to the Fleet website, and that there is no harm in storing them in the website database. The script that sends aggregated metrics to datadog is already excluding statistics sent by development instances and unreleased versions of Fleet.

eashaw avatar May 23 '24 16:05 eashaw

Webhook fix brings light, Analytics flow right, Fleet in cloud city's night.

fleet-release avatar May 23 '24 16:05 fleet-release