reva
reva copied to clipboard
Edge performance issues
When calling the GET /ocs/v1.php/apps/files_sharing/api/v1/shares?include_tags=false&state=all&shared_with_me=true
endpoint, each received share is statted 23 times (twice via the Stat
call, the rest of the times via the ListStorageSpaces
call), since ListStorageSpaces
is called every time any type of operation needs to be performed.
Enabling the storageprovider cache in the gateway reduces this to 12 stats for subsequent calls. For an actual heterogenous, distributed system, this makes the platform unusable.
The performance is further worsened by two calls to list all the received shares for EVERY received share in the shares storage provider.
Note that these numbers have been obtained after the change to route requests by mount IDs was merged.
cc @labkode @butonic @micbar @dragotin
One suggestion would be to not call ListStorageSpaces for every lookup. Just verify that the provider ID or mountpoint matches, return that and assume that it’ll handle the actual storage request.
I tried to implement something related to this but since generating the space path requires attributes of the space a reference belongs to, I couldn’t. The gateway storage provider constructs a relative reference using this path and further uses that in the request. IMO this should be delegated to the storage provider itself.
The gateway storage provider constructs a relative reference using this path and further uses that in the request. IMO this should be delegated to the storage provider itself.
I always wanted to keep all path handling in the registry: it should know the full path to every space root. That would have been a very simple design, but it only works for a spaces and id first approach.
For path based requests the space registry might only be responsible for the prefix that leads to the storage provider. But this is misleading as a storage provider can provide both: personal
and project
spaces. A more precise description would be that the storage registry is responsible to know the path prefix for a set of spaces that lives on the same storage provider (or even multiple ones). That would allow defining which storage providers to query for a given type or path.
The storage drivers however would have to ba able to deal with path based requests. Something we deliberately got rid of.
Maybe the storageprovider can deal with the space lookup for a path. It could make a ListStorageSpaces call with a path filter to get the space and then call the actual method.
cc @kobergj @aduffeck ^
We have to differentiate to types of requests as they follow different lookup semantics:
- relative requests that have a ResourceID and a Path set
- Path based requests
For 1. we added https://github.com/cs3org/reva/pull/2792 which should use the providerids that are provided in ocis since: https://github.com/owncloud/ocis/pull/3619. ocdav does a LookUpStorageSpaceByID call which contains an ID filter. The gateway storageprovider splits the spaceid into a providerid and a nodeid. The space registries ListProvider then makes a findProvidersForResource call which currently then makes a ListStorageSpaces call on the storageprovider with the matching providerid or a provider without an id set.
I think we mixed responsibilities. An the reva storage registry may not need to do an actual ListStorageSpaces call on the provider as the reva storage registry is only responsible for routing. To route a PROPFIND or MKCOL request the ocdav service does not need to know the etag or mtime of the storage space. The gateway certainly does not need it for routing.
I may have failed to clarify that the graph service, which lists the spaces available to a user really is the only service that needs to be aware of etags and mtimes...
🤔
For 2. the routing happens based on path prefixes. The reva storage registry can match requests based on path prefix and The gateway should IMO truncate the path segments it can consume before forwarding the request to the storage provider (/users/e/einstein/relative/path
is matched by /users/[a-k]
, but only the provider path /users
is prefix trimmed, leading to e/einstein/relative/path
reatching the storage provider).
The storage provider should then use the storage driver interface to look up the space for the given path. In the running example from above e/einstein
could be the space alias that can be used to build a relative reference with a storage id and a path relative/path
.
Now, while I would prefer making the storage driver interface always requires relative references and the storage provider always using a yet to specify ResolveSpaceAlias or ListStorageSpaces() call with a filter on the alias, I see the tradeoff when using path only navigation: all the path to space navigation may be unnecessary to reach a resource.
But paths are fragile and the web ui as well as all clients will always use relative requests in the future.
I think both types of requests can be made efficient without an additional call to a storage space. Only the ListStorageSpaces call needs to make actual calls on the providers ... which it currently does anyway...
🤔
I'll check if I can kill the ListStorageSpaces call in the spaces registry. It should be possible because we know the providerid.
@butonic can you link the PR which reduces the stat requests here?