fleet
fleet copied to clipboard
Website HTTP 5XX /api/v1/webhooks/receive-usage-analytics
Fleet version: N/A
Web browser and operating system: N/A
💥 Actual behavior
May 09 07:00:04 error: Sending 500 ("Server Error") response:
May 09 07:00:04 UsageError: Invalid new record.
May 09 07:00:04 Details:
May 09 07:00:04 Could not use specified `organization`. Cannot set "" (empty string) for a required attribute.
May 09 07:00:04 at Object.fn (/app/api/controllers/webhooks/receive-usage-analytics.js:47:35)
May 09 07:00:04 at wrapper (/app/node_modules/@sailshq/lodash/lib/index.js:3305:19)
May 09 07:00:04 at Deferred.parley.retry [as _handleExec] (/app/node_modules/machine/lib/private/help-build-machine.js:1014:29)
May 09 07:00:04 at Deferred.exec (/app/node_modules/parley/lib/private/Deferred.js:286:10)
May 09 07:00:04 at Deferred.switch (/app/node_modules/machine/lib/private/help-build-machine.js:1469:16)
May 09 07:00:04 at Object._requestHandler [as webhooks/receive-usage-analytics] (/app/node_modules/machine-as-action/lib/machine-as-action.js:1153:27)
May 09 07:00:04 at /app/node_modules/sails/lib/router/bind.js:248:46
May 09 07:00:04 at routeTargetFnWrapper (/app/node_modules/sails/lib/router/bind.js:395:9)
May 09 07:00:04 at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04 at next (/app/node_modules/express/lib/router/route.js:149:13)
May 09 07:00:04 at Route.dispatch (/app/node_modules/express/lib/router/route.js:119:3)
May 09 07:00:04 at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04 at /app/node_modules/express/lib/router/index.js:284:15
May 09 07:00:04 at Function.process_params (/app/node_modules/express/lib/router/index.js:346:12)
May 09 07:00:04 at next (/app/node_modules/express/lib/router/index.js:280:10)
May 09 07:00:04 at next (/app/node_modules/express/lib/router/route.js:141:14)
May 09 07:00:04 at alwaysAllow (/app/node_modules/sails/lib/hooks/policies/index.js:178:16)
May 09 07:00:04 at routeTargetFnWrapper (/app/node_modules/sails/lib/router/bind.js:395:9)
May 09 07:00:04 at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04 at next (/app/node_modules/express/lib/router/route.js:149:13)
May 09 07:00:04 at Route.dispatch (/app/node_modules/express/lib/router/route.js:119:3)
May 09 07:00:04 at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
May 09 07:00:04 <- POST /api/v1/webhooks/receive-usage-analytics (3ms 500)
May 09 07:00:04 | error
🧑💻 Steps to reproduce
Could not use specified `organization`. Cannot set "" (empty string) for a required attribute.
Looks like report in with an empty organization.
🕯️ More info (optional)
N/A
🛠️ To fix
- [ ] On the website: Update the
receive-usage-analytics
webhook to return a400: bad request
response to Fleet instances that report an unreleased version of Fleet. - [ ] Update Fleet to not send usage statistics if the server is running an unreleased version of Fleet
Root cause:
A Fleet instance sent usage statistics that reported an empty string as the organization. Since this value was defined, it was not replaced with the default value (unknown
). When the webhook attempted to create a HistoricalUsageSnapshot record in the database, it threw an error because organization
is a required value.
AFAIK, organization
is only reported by instances with a Fleet Premium license, and that value is set is from the organization value of the license key.
Impact: One Fleet instance failed to report usage statistics three times.
Resolution: I merged #18864 to prevent this 500 error, but now I think that was the wrong way to handle this. This error is likely a symptom of something wrong with the instance that tried to report statistics, and the changes in that PR don't address the root issue.
- [ ] TODO: Make a PR to the receive-from-usage-analytics webhook to log a warning with the anonymous ID of the instance when this happens.
@lukeheath What do you think we should do about errors like these? I looked at the usage statistics that were causing this error, and was from a Fleet Premium instance that reported fleetd-chrome-v1.3.0-157-g80bce47c37-dirty
as its version.
@eashaw On the website endpoint, we should do some validation on what we'll accept as a version and if the parameter's value is invalid, we should return a 400 Bad Request.
@sharon-fdm On the Fleet server side, we should also validate what we're sending to the usage analytics endpoint. Based on the version, it appears a chrome extension was trying to send usage analytics.
I am adding the #g-endpoint-ops
label and keeping #g-digital-experience
so that you can both action the same bug.
Thanks @lukeheath.
@getvictor does it make sense that our Chrome ext. sent statistics to the usage analytics endpoint? (I can't recall a reason for that)
(fleetd-chrome-v1.3.0-157-g80bce47c37-dirty
)
@sharon-fdm @eashaw fleetd-chrome-v1.3.0-157-g80bce47c37-dirty
is the Fleet server version. The reason it has fleetd-chrome
in the name is because it was built from the commit that has the fleetd-chrome-v1.3.0
tag.
It seems like a development build.
@lukeheath Another possibility is that this issue has something to do with the tf-mod-addon-monitoring-v1.4.0
tag that was also put on that same commit.
@sharon-fdm Should we consider validating the Fleet server version before sending usage analytics? That way, we're not skewing the data with fake usage analytics.
@lukeheath, not a bad idea. I'll create a eng-intiated ticket.
@sharon-fdm Could we do it as part of this bug ticket and avoid the new issue?
I'm closing this issue. After discussing this with Sharon, we decided we do not need to validate the usage statistics sent to the Fleet website, and that there is no harm in storing them in the website database. The script that sends aggregated metrics to datadog is already excluding statistics sent by development instances and unreleased versions of Fleet.
Webhook fix brings light, Analytics flow right, Fleet in cloud city's night.