autopush
autopush copied to clipboard
Confirm whether uaids are being dropped on 404/410 from bridged servers.
Based on this comment, it's my understanding that when the FCM server responds with a 404
or 410
status code, the intended behavior of the autopush server is to drop the corresponding uaid record and all its subscriptions. The logic for doing so lives in _router_fail_err
here:
https://github.com/mozilla-services/autopush/blob/a459c882ec63ba5368f9c3b0648c084177b3a2ac/autopush/web/base.py#L336-L346
It's not clear whether this logic is not triggering correctly.
Based on FxA server logs, we're definitely seeing 404
and/or 410
responses when trying to send push messages to mobile clients, since FxA logs a specific "subscription expired" event in this case.
I also took a look in grafana for events of type autopush.notification.bridge.error[reason:recipient_gone]
, which would correspond to the FCMNotFoundError
error type:
https://github.com/mozilla-services/autopush/blob/2f08e883ec0b6bee3e485a2be6587fe55fc1e025/autopush/router/fcm_v1.py#L177-L183
I am able to see a small but steady rate of such errors. So I think it's clear that such errors are in fact happening.
However...
If I look in grafana for events of type autopush.notification.bridge.error[reason:unregistered]
as would be emitted alongside the drop_user
call above, I do not see any events at all for platform:fcm
. In fact the only instances of such an event are for platform:gcm
, which may be coming from this different codepath that emits a similarly-named event.
I also believe that the current appservices push component would fail if its uaid record were to be discarded by the server, since I can't find any codepaths that would recover from such a state. But we haven't observed any devices that seem to be in such a state in the wild.
So I'm wondering if the drop_user
logic linked above is working correctly, or whether it might be failing to trigger in practice. The observed behaviour of mobile push clients in the wild suggests some instances where the autopush server believes a subscription is valid but the FxA server does not, and a failure to drop subscriptions on 404
/410
could explain that.
I also believe that the current appservices push component would fail if its uaid record were to be discarded by the server, since I can't find any codepaths that would recover from such a state. But we haven't observed any devices that seem to be in such a state in the wild.
Update: https://github.com/mozilla-services/autopush/issues/1445 seems to show evidence of what might be devices in such a state in the wild.
Looking at the autopush python code, it appears that we do not drop them.
It's worth noting that the newer rust version does drop these records.