hhvm icon indicating copy to clipboard operation
hhvm copied to clipboard

Automatically update first-party dependencies

Open Atry opened this issue 3 years ago • 11 comments
trafficstars

Is your feature request related to a problem? Please describe. Currently HHVM dependency versions are hard coded in the CMake files like this: https://github.com/facebook/hhvm/blob/3c1feb001d17784ba0e043b3fa59fb9c933b1693/third-party/fb-mysql/CMakeLists.txt#L13-L19

However, HHVM is co-evolved with other OSS projects maintained by Meta Platforms, including folly fbthrift, fb-mysql, etc. When an upstream change is made in a first-party dependency, we will have to wait for the next release of the dependency and then modify the call-site in HHVM.

Describe the solution you'd like We need a way to automatically update first-party dependencies to make first-party change be available to HHVM as soon as possible.

Describe alternatives you've considered Otherwise we can keep the current approach to manually update dependencies, but it would slow down the HHVM development and is not compatible with the mono repo practice in Meta Platforms.

Atry avatar Jun 30 '22 19:06 Atry

Hi @fredemmott, Previously the first-party dependency was using git submodules, which is easier to update. You changed it to CMake settings last year, which is harder and slower to update. I understand the CMake approach makes sense when the dependency is optional, which could be solved from the OS, but I don't understand the purpose of the CMake approach for first-party dependencies, given that we should always want the head revision of first-party dependencies.

@fredemmott, do you know why we download first-party dependencies from CMake?

Atry avatar Jun 30 '22 19:06 Atry

  • Git submodules are much more painful to use; when they were submodule based, a standard part of debugging build failures was rm -rf third-party; rm -rf build/third-party; git checkout third-party; git submodule update --init --recursive; make
  • similarly, shallow clones have usability problems for updating
  • non-shallow clones are slow and huge
  • unless pointed at a tag, they're unreliable and non-reproducible after a while: GitHub stops serving requests for submodules by sha after a variable amount of time/number of commits/not sure it greatly improves build speed and reliability on internal build systems given they can be - and are - cached

given that we should always want the head revision of first-party dependencies.

Builds need to be reproducible; HHVM today is unlikely to be buildable with folly in 6 months' time

Additionally, head is often not in sync between folly/thrift/... - the tags are.

and is not compatible with the mono repo practice in Meta Platforms.

No reference approach is both good externally and good for a mono repo - one is a mono repo with cross-project atomic commits, one isn't, and they have different requirements. In public, you do not have atomic commits, and pretending to have them across multiple github projects will break the ability to use git bisect for issues in public builds.


It's also important to note: auto-updating is entirely independent of submodules vs externalproject_add. They're formulaic, and it would be relatively straightforward to change them to be more formulaic.

fredemmott avatar Jun 30 '22 22:06 fredemmott

For the first-party stuff, a better way to auto-update would be to actually commit them to the HHVM github repo, similar to how flow includes hack - i.e. turn facebook/hhvm into a monorepo as far as fb deps are concerned

e.g. map fbcode/folly to third-party/folly/ - no submodules or CMake fetching, atomic commits

fredemmott avatar Jun 30 '22 22:06 fredemmott

fmtlib

FYI, this isn't an FB-owned or FB-source-of-truth project; if FB has an internal version that you want to use instead of the public version, directly publishing that to third-party/fmt is probably the way to go

fredemmott avatar Jun 30 '22 22:06 fredemmott

unless pointed at a tag, they're unreliable and non-reproducible after a while: GitHub stops serving requests for submodules by sha after a variable amount of time/number of commits/not sure

Thank you for the information! Do you know if there is any URL about the issue?

I never experienced the issue and it is surprising to me. If GitHub indeed stops serving source files by sha, it would affect NPM, Composer, Bundler, Nix and many other package managers because they all include sha in their lock files to reproduce a build with dependencies to git branches.

Atry avatar Jun 30 '22 22:06 Atry

It's also important to note: auto-updating is entirely independent of submodules vs externalproject_add. They're formulaic, and it would be relatively straightforward to change them to be more formulaic.

Do you mean the current externalproject_add approach also supports source tarballs from a revision sha instead of a tag?

Atry avatar Jun 30 '22 22:06 Atry

Not seeing a super obvious resource

it would affect NPM, Composer, Bundler, Nix and many other package managers because they all include sha in their lock files to reproduce a build with dependencies to git branches.

the usual approach is to clone the branch or a tag then reset back to the specific commit, not to clone by sha; this does mean fetching more data though. I don’t know the specific method submodules use nowadays

fredemmott avatar Jun 30 '22 22:06 fredemmott

Do you mean the current externalproject_add also support source tar balls from a revision sha instead of a tag?

Take a look at the docs - there’s built in git support, and you can provide arbitrary commands for all the steps to do whatever you want

fredemmott avatar Jun 30 '22 22:06 fredemmott

It's a ton of helpful information about the previous made decisions! #Thank you! @fredemmott

Atry avatar Jun 30 '22 23:06 Atry

For the first-party stuff, a better way to auto-update would be to actually commit them to the HHVM github repo, similar to how flow includes hack - i.e. turn facebook/hhvm into a monorepo as far as fb deps are concerned

e.g. map fbcode/folly to third-party/folly/ - no submodules or CMake fetching, atomic commits

Just want to highlight this: if you want faster/autoupdating first-party dependencies, I strongly recommend making shipit directly copy them - the .cpp and .h files - directly into the facebook/hhvm repo. Compeltely get rid of all cross-repo stuff. That gets you live updates, working internal CI, bisectability, and atomicity.

fredemmott avatar Jun 30 '22 23:06 fredemmott

Sounds reasonable! For comparison, fbthrift is using the bot to update submodules, e.g. https://github.com/facebook/fbthrift/commit/3fe8c7c50f4ad4829f0bebbe703a07ce53298a69

Atry avatar Jun 30 '22 23:06 Atry

Fixed in #9181, #9144 and #9164

Atry avatar Sep 13 '22 19:09 Atry