monorepo icon indicating copy to clipboard operation
monorepo copied to clipboard

[feature] lazy fetching of files

Open janfjohannes opened this issue 1 year ago • 3 comments

We split this in to separate tasks for now:

  1. Creating a tree object manually and pass it to commit
  • this is needed by lazy fetching but we can test it isolated in this way
  1. splitting up of clone and checkout
  • currently this is done in one step. for lazy loading we will have this separated anyway. introducing this change earlier allows us to keep changes small and detect side effects earlier on.
  1. Optimizing ref request.
  • currently this is part of the checkout and leads to at transfer of a list of all branches/pullrequests of a project. This roundtrip alone leads to a huge lag. While this is only happening in pulls/pushs/clones in the current setup and the performance increase will be minor but during lazy fetching this will happen on each first file read and is a bottleneck. Introducing this as a small change helps to decrease risk

janfjohannes avatar Oct 13 '23 21:10 janfjohannes

@janfjohannes about "3. Optimizing ref request": The problematic request we are talking about is the request to gits info/refs?service=git-upload-pack endpooint. this is referenced in isomorphic gits discover method (see: https://github.com/isomorphic-git/isomorphic-git/blob/90ea0e34f6bb0956858213281fafff0fd8e94309/src/managers/GitRemoteHTTP.js#L66). This method has no filter argument see https://www.git-scm.com/docs/http-protocol#_discovering_references.

The git docu even states that:

All HTTP clients MUST begin either a fetch or a push exchange by discovering the references available on the remote repository.

I double checked the sources and couldn't find a reason why we MUST execute this before push/fetch. What i understand is that this endpoint not only returns the refs but also the servers capabilities. But since we know that the features we use are supported by all vendors we currently integrate - this list is not of importance for us at the moment. I assume we could call the list-refs endpoint instead - that again supports filtering.

We might be able to intercept the calls to info/refs - execute the list refs endpoint and return the format we expect from the info/refs endpoint. But all thoses interception becomes really really hard to maintain and I think we should consider forking instead again.

Response of the info/refs endpoint:


001e# service=git-upload-pack
00000153d7e62aef79d771d1771cb44c9e01faa4b7a607fe HEAD multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/main filter object-format=sha1 agent=git/github-cbc05ce31956
0051ec5b0fb1cdd67f0d33abf827a2449566de790080 refs/heads/Add-missing-translations
004fb63b5f96b25e94d964cc3dd8e8fa6f9c13d61aa7 refs/heads/NiklasBuchfink-patch-1
004d2fde5aa827d48377553cb05111001bbbce5b49cd refs/heads/NilsJacobsen-patch-1
004d1f121b525c005b800215ee5db4abee8a5c946b15 refs/heads/NilsJacobsen-patch-2
004d0682896b6f529bb41685b06a68e86cc4d88393e0 refs/heads/NilsJacobsen-patch-3
003dd7e62aef79d771d1771cb44c9e01faa4b7a607fe refs/heads/main
0041bf5c0513193b48e0fa9a5eba49cbc7d7ddb0aa27 refs/heads/todo-app
003e13175b43bf3bff18e65231a3f6a7c2491238cec1 refs/pull/1/head
003fb2161551dc2e1aa79f85590a07abc0f98db4010f refs/pull/10/head
003f43a00fa8e11d869441bed42079e83b5ff2e3a88e refs/pull/11/head
003f68dd9ca1e1a81cfd0281ae743596f4a574cef135 refs/pull/12/head
003f5ec41b023bbbf07ab59a67cd9603e3b8c60c297e refs/pull/13/head
003ff200fdda3bc325f8b8bd816a5b7c0b9ffbdfc1ff refs/pull/14/head
003febb68ee84313f2a18d8b953138c94d57d350762f refs/pull/15/head
003f7241ae985e560f750ce8a36048acf5d15eaf7cae refs/pull/16/head
003fec5b0fb1cdd67f0d33abf827a2449566de790080 refs/pull/17/head
003f7b82921e24e5d3a1c9db30d8b2d3ad00b62a6f83 refs/pull/18/head
003f7b99fe7da844c4601e64176675513bfdbce73178 refs/pull/19/head
003e66d62d560d4ffe6f8c9f57cbc0e5c10bf713b8fa refs/pull/2/head
003f2fde5aa827d48377553cb05111001bbbce5b49cd refs/pull/21/head
003f1f121b525c005b800215ee5db4abee8a5c946b15 refs/pull/22/head
003f0682896b6f529bb41685b06a68e86cc4d88393e0 refs/pull/23/head
003fb3c263a9d26f72f8e3354c98f38f16277cd253da refs/pull/25/head
003f352013c0404bb9e9cf95a17d9e460d4fbbdfdf95 refs/pull/26/head
003fd57cca52ba77289b1ac19ad35748762397f4523c refs/pull/27/head
003fc9202aed68c11395512be6e70c10da72d137a830 refs/pull/28/head
00405a5b28d580145a189b6e36e31ddb1490776222c0 refs/pull/28/merge
003f40f25d9a6780dc1ed7ba3f02e65b6fde167c250b refs/pull/29/head
003e3ccc5d54735938ed73dc6c6784cdbe34060480ba refs/pull/3/head
003e1eeb4ae27271df4ffe3e50c10442b69595bc8c12 refs/pull/4/head
003ed2948a6865b55b44d1e607d492bd9e94ae24cb7c refs/pull/5/head
003e03dfbe595ccd0015c58687cd2649a594e0d9649a refs/pull/6/head
003e35df3530eb5f2a2b8202c9997f30bbbb45dc5836 refs/pull/7/head
003ed3aa71c7dbba6aeeb435dc8dbcf3d612606ddbee refs/pull/8/head
003e064baa9a52884540e9a180d40f8e4ef59d33d63c refs/pull/9/head
0000

martin-lysk avatar Nov 08 '23 12:11 martin-lysk

Ok - seams to me we can follow the approach to intercept the discovery request!

To build the response expected from the version-1 info/refs?service=git-upload-pack endpoint we need:

  1. the (partial-)list of refs
  2. the list of symrefs (hidden in the capabilities)
  3. the capabilities of the server.

1. partial list of refs and 2. symrefs**

The ls-refs command of git-upload-pack allows us the fetch refs as well as symrefs (see: https://git-scm.com/docs/gitprotocol-v2#_ls_refs) which allows us also to resolve the symrefs that are hidden in the capabilities (symref=HEAD:refs/heads/main in the second line in the response shown above).

I doublchecked the use of the refs in the isomorphic git repo: We can use a subset of the refs a complete list of remote refs is not a requirement.

Fetch: Isomorphic git filters out the refs not needed it self (compare: https://github.com/isomorphic-git/isomorphic-git/blob/90ea0e34f6bb0956858213281fafff0fd8e94309/src/commands/fetch.js#L153) Push: A list of remote refs with only the the references specified in the parmeters ref and remoteRef should be sufficient (compare: https://github.com/isomorphic-git/isomorphic-git/blob/90ea0e34f6bb0956858213281fafff0fd8e94309/src/commands/push.js#L57)

3. the capabilities

For the capabilities - we can start with a hardcoded ist for github and make this clean when we fork isomorphic git. capabilities list: multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done

martin-lysk avatar Nov 08 '23 13:11 martin-lysk

@martin-lysk sorry for late answer :) thats good news and sounds like a great plan. i saw on discourese this is allreayd done but llooking forward to trying it :)

janfjohannes avatar Nov 09 '23 19:11 janfjohannes

unsynced duplicate in linear https://linear.app/opral/issue/LIX-3/[feature]-lazy-fetching-of-files

samuelstroschein avatar Apr 06 '24 00:04 samuelstroschein