Google-compatible token refresh
The Google Photorealistic 3D Tiles documentation page (https://developers.google.com/maps/documentation/tile/3d-tiles) says:
The render can make at least three hours of tile requests from a single root tileset request. After reaching this limit, you must make another root tileset request.
cesium-native should handle this possibility.
Currently, when we see a 403 for a tile in a tileset from Cesium ion, we will re-request the Cesium ion asset/endpoint service and adopt the new key that it returns. But this won't help with the Google tileset, because the asset/endpoint doesn't contain a key of that type. Instead, the key is embedded in the root tileset.json URL, and we have to request that in order to get a new session.
Furthermore, it's possible to load the Google tiles directly from Google rather than going through Cesium ion, and in that case no attempt will be made to refresh the session at all.
So we should investigate our options for handling the need to refresh Google sessions. Ideally, the solution would be generally applicable to tilesets from other providers, too, and wouldn't need to encode any Google-specific logic. One option, for example, is to do a full tileset refresh (unload everything and start over) whenever we see a 403 and its either not a Cesium ion tileset, or the normal Cesium ion token refresh procedure doesn't work. But there are probably better options.
After looking into this for a bit, I think the structure of a solution without Google-specific handling would look something like this:
sequenceDiagram
Tile Level N->>Tile Level 1: Report 403 error
Tile Level 1->>Tile Level 0: Report 403 error
Tile Level 0->>Tile Level 0: Reload root tile
alt If URL differs
Tile Level 0->>Tile Level 1: Provide new base URL
note over Tile Level 1: Re-resolve URL with new base URL
Tile Level 1->>Tile Level N: Provide new base URL
note over Tile Level N: Re-resolve URL with new base URL
Tile Level N->>Tile Level N: Retry request with new URL
else If URL is the same
note over Tile Level N,Tile Level 0: Report error and abort
end
If a tile in the tree fails to load with a 403 error, we propagate that error up to the root tile, which initiates a second request for the root tile. If the root tileset is the same, we report an error and abort as is the current behavior. But if any of the external tileset children have different URLs (or just different query parameters?) than they did in the first request, we propagate that to the loaded children as their new base URL. The children then update their own URLs by Uri::resolveing with the new base URL, incorporating the new query parameters. Then, in turn, they pass these URLs as the new base URL for their children. Once the new URL reaches the child that failed to load initially, we re-initiate the request and keep loading from there.
Of course, there's a lot of hand-waving going on there in that description, but this is what comes to mind when trying to think up a solution that would work for all tilesets with this sort of expiration behavior, whether on Cesium ion or not.
Or we could just reload the entire tileset. We would be unloading a bunch of assets just to load them again a moment later, but it would significantly simplify the solution here, and have less likelihood of causing issues for weirdly-behaved tilesets.
I think perhaps the way to propagate that error from the child tile up to the root tile is to add a fourth TileLoadResultState indicating that the TilesetContentManager needs to reload the root tile.
It appears Google returns a 304 Not Modified for the root tileset JSON if it's the same as the last time you requested it (no new token). Might be better to rely on this instead of comparing URLs to decide when to propagate query parameters to the children.
Thanks for the excellent analysis, @azrogers! This is tricky... Let me write this down for my own reference.
In the specific case of Google, the root tileset.json URL will be something like https://tile.googleapis.com/v1/3dtiles/root.json?key=[key], and that key will never change (if it does ever change, that should be handled by the Cesium ion refresh mechanism, because this key is provided by ion).
But inside the tileset.json returned from that request, nested several tiles deep, there's a content.uri pointing to another json file (an external tileset) with a session query parameter. That session then gets propagated to all further tile content requests, including through additional external tilesets.
As an aside, this is contrary to web norms. Resolving a relative URL normally does not propagate query parameters from the base URL. But CesiumJS does, and cesium-native does too, and Google is relying on it.
So to do a token (session) refresh for the Google tileset, we need to re-request that root.json, then propagate the session query parameter to all further requests. Currently the root.json only has a single leaf tile with a content.uri (nested a couple of levels deep), but there's no requirement that this is the case. So I think that a proper solution would require "merging" the re-downloaded root.json with the existing Tile tree. And when we hit a Tile with external tileset content, we have to propagate that query parameter to the TilesetJsonLoader's _baseUrl and on through to any tiles within that external tileset that are further external tilesets.
We'll have to handle the possibility of the Tile hierarchy changing between the two requests. It's unlikely, but we can't crash if it happens.
This is a major pain, to say the least.
And I'm not sure it's worth it. Refreshing the tileset completely once every three hours or so doesn't sound so bad, compared to the complexity of developing (and testing!) such a solution. Refreshing doesn't sound completely trivial either, though. What do you think?
I left the CesiumJS Google Photorealistic 3D Tiles Sandcastle open for a few hours, and then moved the view. I got a bunch of errors like this:
[{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT"
}
}
]
(and yes, the actual HTTP status code was 400 as well)
So... it seems Google returns a 400 error when the session expires, not 403 as I had previously believed.
Performing the refresh when we hit a certain condition shouldn't be too difficult, but I'm trying to think of how we would be able to decide to refresh it. Refreshing the tileset the first time we hit a 403 or 400 error might not be the ideal behavior. Certainly a 400 error shouldn't be returned in the course of normal operation, but it would mean for example if the URL of a single glb in a tileset was mistyped and caused a 400 on request, it would break the entire tileset instead of just breaking that tile. Could we keep track of how many consecutive tile loads have caused an HTTP error status code, and have a threshold for refreshing the tileset? Like, if the last ten attempted tile loads all returned 400 errors, we trigger a refresh?
Thank you for working on this. A fix will be much appreciated! When implementing, would it be possible to surface an event to cesium-unity on refresh?
As discussed offline, refreshing after X tiles fail might still not be great for users, because a major use-case where this comes up is when recording long videos. If we let a few tiles fail, it could break a few frames of video before the refresh kicks in. I don't really have a better idea, though!
@LisaBosCesium do you think we might be able to persuade Google to improve the way Google Photorealistic 3D Tiles reports an expired session? Currently it returns HTTP status code 400 and some JSON that says little more than "invalid argument". It's hard for us to know how to correctly respond to such a generic error. It would be much better to return a 401 status code (this is what Cesium ion does in a similar situation) and a descriptive message in the JSON.
@azrogers maybe one approach, even if it's a bit of a cop-out, is to raise an event when we see any tile errors at all. That would be useful in general for our users, and it would at least allow them to implement their own refresh logic.
Raising an event on a tile error is on the docket anyways if we hope to accomplish CesiumGS/cesium-unreal#542, so I think it's definitely worth it.
@kring @azrogers I'll raise this with Google.
@LisaBosCesium this is coming up again in a prominent first-party integration of cesium-native (and elsewhere). I'm guessing we would have heard if the conversations with Google (mentioned above) went anywhere, but let me know if there's any update or new information about it.
Would it be possible to use a similar solution to what we've just done for Bing Maps here? Refresh the session every two hours and fifty five minutes instead of trying to find the right HTTP status code to listen for?