nativelink icon indicating copy to clipboard operation
nativelink copied to clipboard

Implement GetTree()

Open allada opened this issue 2 years ago • 4 comments

To optimize clients that support it, we should implement get_tree()

see: https://github.com/TraceMachina/nativelink/blob/2ae7cab4c7d6cc476bb5de31ffbaf6f59406ce8a/nativelink-service/src/cas_server.rs#L244

https://github.com/TraceMachina/turbo-cache/blob/5e2e81af8999482fef202b50ee880509e8811e6f/proto/build/bazel/remote/execution/v2/remote_execution.proto#L430

allada avatar Oct 30 '23 23:10 allada

I'd like to work on this issue.

blizzardc0der avatar Apr 06 '24 18:04 blizzardc0der

Given the following documentation in the proto file:

  // Fetch the entire directory tree rooted at a node.
  //
  // This request must be targeted at a
  // [Directory][build.bazel.remote.execution.v2.Directory] stored in the
  // [ContentAddressableStorage][build.bazel.remote.execution.v2.ContentAddressableStorage]
  // (CAS). The server will enumerate the `Directory` tree recursively and
  // return every node descended from the root.
  //
  // The GetTreeRequest.page_token parameter can be used to skip ahead in
  // the stream (e.g. when retrying a partially completed and aborted request),
  // by setting it to a value taken from GetTreeResponse.next_page_token of the
  // last successfully processed GetTreeResponse).
  //
  // The exact traversal order is unspecified and, unless retrieving subsequent
  // pages from an earlier request, is not guaranteed to be stable across
  // multiple invocations of `GetTree`.
  //
  // If part of the tree is missing from the CAS, the server will return the
  // portion present and omit the rest.
  //
  // Errors:
  //
  // * `NOT_FOUND`: The requested tree root is not present in the CAS.

referenced here: https://github.com/TraceMachina/nativelink/blob/5e2e81af8999482fef202b50ee880509e8811e6f/proto/build/bazel/remote/execution/v2/remote_execution.proto#L430

We want to implement this API.

In short this API is an optimization to reading the CAS objects and walking recursively to collect all sub-directories of the provided digests.

This is a streaming API, so as we are collecting data we can send the data to the client so the client can perform work while we are still doing work.

allada avatar Apr 10 '24 16:04 allada

Some code already in NativeLink that already does some similar things: https://github.com/TraceMachina/nativelink/blob/e890c01c1e4654b9b2aae026614f005be06de117/nativelink-worker/src/running_actions_manager.rs#L122

and: https://github.com/TraceMachina/nativelink/blob/e890c01c1e4654b9b2aae026614f005be06de117/nativelink-store/src/completeness_checking_store.rs#L75

allada avatar Apr 10 '24 16:04 allada

I'd like to work on this issue. cc: @allada

boldpulse avatar May 06 '24 09:05 boldpulse