setup-ruby icon indicating copy to clipboard operation
setup-ruby copied to clipboard

Downloading pre-built Ruby Versions at runtime

Open MSP-Greg opened this issue 5 years ago • 61 comments

@bryanmacfarlane

With https://github.com/actions/setup-ruby/issues/42 closed, I thought I'd continue here.

If one views this repo's purpose as accessing pre-built Rubies located in 'hostedtoolcache', then it is functioning as expected.

For quite a while, issues have been posted that imply that what's contained in 'hostedtoolcache' is not meeting users needs.

Given that, and given that GitHub employs several people active in the Ruby community, are there any plans for changing the current setup?

The Ruby community can certainly come up with solutions, but some at point, we/they will have some needs from Actions to make it function reasonably well. An example would be MSYS2 for Windows extension gem CI, and also building Ruby itself. Installing MSYS2 from scratch is time consuming.

If this isn't the palace to discuss this, please suggest where...

MSP-Greg avatar Dec 30 '19 14:12 MSP-Greg

Ping @ioquatix @eregon @tenderlove

Sorry for the ping...

MSP-Greg avatar Dec 30 '19 14:12 MSP-Greg

A Ruby build zip is +/- 10 MB. One thought is to have the Ruby community build Rubies and host them as packages in some repo, then provide an action that downloads and installs them...

MSP-Greg avatar Dec 30 '19 15:12 MSP-Greg

I was thinking something along the same line. We could have a separate GitHub action that just builds various Ruby versions for all virtual environments available in GitHub Action, for instance using ruby-build or ruby-install. Then this action could just download the result from that action's repository release downloads.

eregon avatar Dec 30 '19 15:12 eregon

cc @clupprich this might interest you too (as the author of https://github.com/clupprich/ruby-build-action). I think a more convenient caching/building mechanism would be quite nice for actions, instead of a cache per repository.

eregon avatar Dec 31 '19 13:12 eregon

Yes, avoiding having consumers build it every time would be great. Hosted machines are a fresh VM every job.

Building ruby, making it globally available and then having the setup-ruby pull the binaries just in time would offer the ability to pin to any version flexibly.

Things to note:

  • The actions/virtual-environments images effectively do this not as flexibly. I believe they only lay down a couple versions into the cache which is why I believe the defaults push a major version binding. They do it at factory disk creation (not users build time) which is an advantage but it can't possibly lay down all versions.
  • If you build and publish ruby binaries to a globally available location it would be nice to have a json file with all the metadata so the action can do semver and match versions. Nodejs does this and the setup-node action uses it.
  • self-hosted runners will benefit from job to job if downloaded into the tool cache by tool, version, platform
  • The recommended pattern in the tool actions is to match the semver supplied against the cache dir first and then against the cloud. This adds resiliency for availability / load. e.g. nodejs was having download issues and it affected all workflows since the factory disk tool cache didn't have enough versions (12 was published).
  • If you have a globally available set of bits, a CDN would be best. Nodejs didn't have a CDN as of the time 12 was released and the load caused issues.

That would mean you have to create, manage a CDN. That is unless you can lean on another packaging service like GPR etc. for a universal type package of bits. One other option would be to use releases in the ruby repo and attach binaries (see actions/runner where we do that with most recent release). The binary packages in a release are backed by CDN.

Hopefully, all that makes sense. Let me know how I can help.

bryanmacfarlane avatar Dec 31 '19 16:12 bryanmacfarlane

@bryanmacfarlane

Thanks for the response. I've assumed that binaries/builds would be stored in a repo's releases.

I've maintained a Ruby master MinGW for a few years for CI use. I've had it running as a cron job, three times a day. One question about using GH repo releases for storage.

The current actions/upload-release-asset isn't really clear about how to update an item to an existing release.

Will there be issues with a repo's Actions job trying to download the binary when the cron job is updating the release build binary? Maybe I should ask the question there...

MSP-Greg avatar Dec 31 '19 16:12 MSP-Greg

I made a quick prototype (just a proof of concept at this point) to build Ruby versions for each GitHub virtual environment: https://github.com/eregon/ruby-install-builder/releases/tag/builds

Then I think downloading that and unpacking that for the same virtual environment should just work.

One potential issue is virtual environments/images do change over time, so that might break things if e.g., a new libssl version is used.

It's really easy for Linux and macOS, but probably quite a bit different for Windows, where I think we should likely use MSP-Greg's Windows Ruby builds.

eregon avatar Dec 31 '19 17:12 eregon

I worked a bit more on this idea, and https://github.com/eregon/use-ruby-action now works with MRI, JRuby and TruffleRuby, with exact versions, on Ubuntu and macOS. It just downloads and extracts an archive, so it's just a couple seconds to set it up.

Would an approach like that make sense for actions/setup-ruby?

eregon avatar Jan 01 '20 15:01 eregon

@eregon - thanks, I'll have a look today.

bryanmacfarlane avatar Jan 02 '20 15:01 bryanmacfarlane

I reviewed use-ruby-action and ruby-install-builder and I think we could adopt them. They seem well put together, have good separation of concerns, and should be reliable and "scalable" (i.e. support different versions of Ruby) into the future.

The only thing I'd suggest is perhaps supporting more Ruby versions (trivial) and perhaps having a suggested template for users to follow, not sure if this exists or not. Ideally, the default template should be super simple - I wish users don't need to specify Ruby versions, but instead could write something like: all supported releases which right now would be MRI 2.4 - 2.7, JRuby and TruffleRuby, rather than mucking around with environments. Because most gems should just use this by default.

ioquatix avatar Jan 02 '20 21:01 ioquatix

Sorry, my bad,I just saw https://github.com/eregon/use-ruby-action#usage which shows the usage - so my only point is perhaps just some way to make it easier for users to have a default build matrix which includes all current released versions of Ruby across all latest supported OS distributions.

ioquatix avatar Jan 02 '20 21:01 ioquatix

The matrix needs to list all combinations explicitly for the workflow to understand it AFAIK, so I think it can't really be much better than: https://github.com/eregon/ruby-install-builder/blob/f967c81fdf7097dc33fbf6ac5adeab551ae9edaa/.github/workflows/build.yml#L32-L34 Copying here for convenience:

      matrix:
        os: [ 'ubuntu-16.04', 'ubuntu-18.04', 'macos-latest' ]
        ruby: [ 'ruby-2.4.9', 'ruby-2.5.7', 'ruby-2.6.5', 'ruby-2.7.0', 'truffleruby-19.3.0', 'jruby-9.2.9.0' ]
# This also works and is a fair bit shorter:
        ruby: [ '2.4.9', '2.5.7', '2.6.5', '2.7.0', 'truffleruby', 'jruby' ]

We could accept ruby-2.4 and ruby-2.4.x (setup-ruby supports this) of course.

eregon avatar Jan 02 '20 21:01 eregon

What about adding -head variants?

ioquatix avatar Jan 02 '20 22:01 ioquatix

What follows is just my quick opinion. First, I think testing against MRI master is of little value, it regularly breaks and there is no CI before pushing. Also, features and changes on master are often reverted/tweaked/fixed, so it would cause a lot of failures for CI of gems. I think JRuby doesn't really recommend testing against master, the CI is not always green either. For TruffleRuby I think it could be useful because all tests must pass before merging to master, but we would first need nightly builds of TruffleRuby (https://github.com/oracle/truffleruby/issues/1483, building TruffleRuby entirely from source is more challenging than downloading an existing release and just recompiling the openssl extension).

In practice it's probably not too hard since I think ruby-install supports building ruby-head. Not sure about jruby-head, there is https://www.jruby.org/nightly but that doesn't give a single link.

eregon avatar Jan 02 '20 22:01 eregon

@eregon

JFYI, re ruby master, ruby-loco is set up that only the most recent passing build is used when downloading. All test suites are run from the install folder, and I also check that bin files are working correctly, or at least reporting a version (rake, gem, bundler, etc).

Granted, API changes are a different matter...

MSP-Greg avatar Jan 02 '20 22:01 MSP-Greg

@MSP-Greg Indeed, if we want to support MRI head, we'd need something like that on all platforms. I think in TravisCI they had a hook triggering only if ruby's own CI passed, we could have something like that.

eregon avatar Jan 02 '20 22:01 eregon

@eregon

ruby-loco is currently built on AppVeyor, but I expect to move it to Actions. I started with the ruby/ruby MinGW build and it's passing. Note that it's running spec tests from the install folder. Working on mswin now. Can't get test-all to pass.

Once I get ruby-loco moved, I'll try using similar code for Ubuntu & MacOS. Might have some questions though...

MSP-Greg avatar Jan 02 '20 23:01 MSP-Greg

@bryanmacfarlane

Assuming you're not often coding in Ruby, maybe you can direct me in the correct direction.

Ubuntu and macOS have compile tools as part of the OS. Ruby on Windows is built with gcc, and uses the MSYS2 system. Also, any Ruby extension gems (gems that have c or c++ code) require the MSYS2 system for compiling on CI.

Currently, the Windows images have three copies of MSYS2, all of which are embedded in the installed Ruby versions. And, yes, three versions is a waste.

Anyway, the question is whether GitHub will reconsider this decision and add an independent MSYS2 install in future image updates. AppVeyor has an extensive set of installed MSYS2 packages (see https://ci.appveyor.com/project/MSP-Greg/appveyor-ruby). I'm not looking for all of that, but the base system and gcc would certainly be a good start.

For other disucssions, see: https://github.community/t5/GitHub-Actions/Windows-MSYS2-Ruby/m-p/30885 https://github.com/actions/virtual-environments/issues/30

Anyway, MSYS2 is also used by langauges/applications other than Ruby.

So, where do we inquire about this?

MSP-Greg avatar Jan 03 '20 00:01 MSP-Greg

For questions / issues on what's on the VM images (msys2 etc.), you're best to drive issues here: https://github.com/actions/virtual-environments

There's multiple issues covering the what versions of ruby are available so I think the discussion should probably be centralized there for ruby versions: https://github.com/actions/virtual-environments/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+ruby

bryanmacfarlane avatar Jan 07 '20 13:01 bryanmacfarlane

Ok, the below is a response to something you've remove from the above post, but it may be helpful for some...

Regarding msys2, you will also have to consider what happens on self-hosted machines. setup-ruby needs to work in isolation so maybe it would fault in the single msys2??

First of all, Standard Windows Rubies (SWR's) are stand-alone, they do not need MSYS2 to run the code that they contain. All required non-Windows dlls are packaged in bin/ruby_builtin_dlls.

SWR's have OS specific code to locate the MSYS2 install, and they will find it if it's installed at C:\msys64.

I can explain more, but I've worked with a lot of repos/gems on Windows, with multiple Ruby versions, and I only have one MSYS2 install.

The one issue is that if one is compiling extension gems, one needs to be aware of library package versions. This issue is common to all OS's, an example would be OpenSSL. Older versions of Ruby use OpenSSL 1.0.2 (or earlier), newer versions use 1.1.1. If one is compiling an extension gem using OpenSSL (common with socket/web server gems), one should use the same version of OpenSSL that the host language version (Ruby) uses.

Lastly, AppVeyor has used an stand-alone MSYS2 install for a long time, and everyone seems able to work with it...

MSP-Greg avatar Jan 07 '20 14:01 MSP-Greg

Thanks. Yeah, I removed because I realized it wasn't a runtime thing.

If the user / workflow is also compiling extension gems then they are bringing the need for msys2 so for self-hosted where they bring their own machine it would be on them to install it like any other software on a self hosted machine. Regarding the hosted images, yes, I believe it makes sense for those to have a stand alone install since it's a clean machine every time and you wouldn't want to have to pull that JIT. And for that, the virtual-environments repo is the best place to drive that.

Hope that makes sense ...

bryanmacfarlane avatar Jan 07 '20 16:01 bryanmacfarlane

Thank you. I've worked with MSYS2 for a while, and there is no way to install it from scratch without taking quite a while, at least in terms of CI time frames.

Ruby is one thing, as they're between 10 & 30 MB compressed, MSYS2 is an order or two in magnitude larger.

EDIT: See https://github.com/actions/virtual-environments/issues/229

MSP-Greg avatar Jan 07 '20 16:01 MSP-Greg

@bryanmacfarlane wrote:

@eregon - thanks, I'll have a look today.

Did you have a chance to take a look at https://github.com/eregon/use-ruby-action? What do you think of that approach?

I think one of the major advantages is it can make new releases of Ruby available within hours. Relying on the toolcache probably means many days (or weeks?) to use new Ruby releases. That action also already supports JRuby and TruffleRuby.

How do you see the future of this action? Using only the toolcache, or prebuilt rubies like use-ruby-action or a combination of both?

I think in the short time it would be nice to merge https://github.com/actions/setup-ruby/pull/27. That would be the first sign for me that this repo isn't inactive.

And then I think we should add a section in the README about alternatives and the status of this action. I think this is urgent because so many actions are now created to work around the limitations of this action. And there is no point to duplicate efforts, rather we should join efforts in one good action. Some example actions I could find: https://github.com/MSP-Greg/actions-ruby by @MSP-Greg https://github.com/clupprich/ruby-build-action by @clupprich https://github.com/masa-iwasaki/setup-rbenv by @masa-iwasaki https://github.com/eregon/use-ruby-action by @eregon (me)

eregon avatar Jan 07 '20 16:01 eregon

I ran a quick 'info' workflow (see the Ruby Info step) from https://github.com/eregon/use-ruby-action, and speed is quick enough to not generate complaints.

See:

https://github.com/MSP-Greg/use-ruby-action-info/runs/374992625

MSP-Greg avatar Jan 07 '20 17:01 MSP-Greg

Thanks. In general, getting tools JIT (as well as caching) is a good direction for this action to move.

I think this is a product question beyond just this action since our current approach is tool cache on the virtual environments and we're working with that team as well.

Adding @chrispat and @ethomson for product feedback. FYI @alepauly from VMs.

My thoughts:

  • It shouldn't be an either / or between virtual environments and pulling JIT. The VM should have some common ones and JIT pull what's not there (today it's just what's on the VM). This is what we do for node.
  • If we do pull JIT, it should be official (signed?) binaries from ruby/ruby. Metadata and downloads shouldn't come from a personal repo. For example, with node, it's an official signed distribution: https://nodejs.org/dist/index.json
  • We have specifically chosen to not overload installers. Our pattern is one setup action per versionable tool to cache by name, version, arch. e.g. setup-node has explicitly chosen to not overload npm and yarn even though they're both node and we had those discussions. They both have independent versions just like ruby and jruby have independent versions. The actions org should maintain the ruby one and the TruffleRuby owner can an action. It's OK to have actions/setup-ruby, jruby/setup-jruby and oracle/setup-truffleruby in the marketplace.
  • On a related note of overload, each release doesn't need to publish every version of ruby. If the build came from ruby/ruby (or wherever ruby is built), a single version * n platforms could be with each release
  • We would need arm versions as well.
  • We happily take contributions via PRs from forks but you have to be in github / actions to be a contrib / maintainer / admin on a repo in the actions org so that's a pretty hard stop. If an external org took over maintaining it (e.g. setup-ruby goes to ruby/ruby) it would move.

bryanmacfarlane avatar Jan 07 '20 17:01 bryanmacfarlane

I have no fundamental issue with having the setup tasks handle more environment configuration which can include downloading a runtime.

One of the issues we have seen in the past with Ruby and Python is making sure things work consistently across Windows, MacOS and Linux with respect to the abiliy to download the runtime on demand. Becuase of this we have chosen to cache Ruby and Python only and not offer the download model. However, if there is a way to maintain the action in such a way that it can download Ruby on the fly for Windows, MacOS and Linux I am happy to discuss taking that PR.

As far as caching is concerned, we do have plans to break the cache module out as part of the toolkit so it can be used by other actions to add that feature automatically.

chrispat avatar Jan 07 '20 17:01 chrispat

Just to re-cap - I think step 1 is getting builds of ruby/ruby to install JIT so folks can pick ~any build. I think we can all agree on that ...

bryanmacfarlane avatar Jan 07 '20 20:01 bryanmacfarlane

@bryanmacfarlane Thank you for exposing the vision behind GitHub setup-* actions.

  • If we do pull JIT, it should be official (signed) binaries from ruby/ruby. Metadata and downloads shouldn't come from a personal repo.

MRI never had official builds AFAIK. I think the reason is that building MRI is quite tight to the system on which it's built. Only the same OS, distribution, openssl, zlib, ... versions work. A good example here is openssl, to which MRI links dynamically, and therefore relies on the specific version used by the system. Node.js in comparison vendors e.g. openssl to avoid this issue.

I'm not saying it's impossible for MRI to have "official builds for GitHub Actions", but it's unprecedented.

Maybe the two repositories above could move to the GitHub ruby organization, if that would help to alleviate the concern. That would need discussion obviously. FWIW, TravisCI made their own builds of Ruby: http://rubies.travis-ci.org/

  • The actions org should maintain the ruby one and the TruffleRuby owner can an action. It's OK to have actions/setup-ruby, jruby/setup-jruby and oracle/setup-truffleruby in the marketplace.

That is very disappointing, I hope you will reconsider. It's also in direct contradiction to what @damccorm said in https://github.com/actions/setup-ruby/issues/20#issuecomment-525849573. I even made a working PR based on that positive response: https://github.com/actions/setup-ruby/pull/28

Many Ruby gems (libraries) want to test against alternative implementations like JRuby and TruffleRuby. One can see that by looking at .travis.yml files of popular gems for instance. Of course for convenience they want to do:

    strategy:
      matrix:
        ruby: [ ruby-version: 2.5.x, 2.6.x, jruby, truffleruby ]
    steps:
      - name: Setup ruby
        uses: actions/setup-ruby@v1
        with:
          ruby-version: ${{ matrix.ruby }}

And not have to duplicate all the logic that follows the setup-* action and use different setup-* actions.

  • On a related note of overload, each release doesn't need to publish every version of ruby. If the build came from ruby/ruby (or wherever ruby is built), a single version * n platforms could be with each release

Yes of course. The setup is like that currently in that repo so if any step is changed for building Ruby then it's applied for all Rubies built. It could be more incremental if the virtual environments are stable enough, and the build process becomes more stable too.

One interesting question there is evolution of GitHub Actions virtual environments. What if the openssl version changes in one of them? All Ruby versions would need to be recompiled for it, or break. What if there is a new virtual environment, e.g., using a newer Ubuntu release? It will need recompilation too. This means it's both very convenient to build Ruby as part of the virtual environment, but if that's not the case the ability to rebuild all supported Ruby versions for a new or modified virtual environment is important.

That sounds impractical if builds e.g. are hosted in ruby/ruby, to produce new builds for old releases for new virtual environments. I think ultimately building those tools (e.g. Ruby here) needs to be aware of the virtual environments, and so IMHO it makes sense to have a separate "builder" repository which builds for each GitHub Actions virtual environment and rebuild as needed.

  • We would need arm versions as well.

That will probably be a new virtual environment then?

  • We happily take contributions via PRs from forks but you have to be in github / actions to be a contrib / maintainer / admin on a repo in the actions org so that's a pretty hard stop. If an external org took over maintaining it (e.g. setup-ruby goes to ruby/ruby) it would move.

That means a very strong reliance on the GitHub maintainers being active. That has been problematic since September. I hope it will better in the future but I would expect like everyone, GitHub maintainers also have other things to do or other priorities. That's contradicting https://github.com/actions/setup-ruby/pull/27#issuecomment-569696521 BTW.

So I think the better way to address maintenance would be to move this action outside the actions organization, and then the community could actively help with maintenance, update, triage, etc.

eregon avatar Jan 07 '20 20:01 eregon

  • We happily take contributions via PRs from forks but you have to be in github / actions to be a contrib / maintainer / admin on a repo in the actions org so that's a pretty hard stop.

Actually, it seems collaborators don't need to be a member of the organization of the repository. At least it is the case for regular repositories on GitHub. Does the actions organization have some special restrictions there?

eregon avatar Jan 07 '20 20:01 eregon

That is very disappointing, I hope you will reconsider. It's also in direct contradiction to what @damccorm said in #20 (comment). I even made a working PR based on that positive response: #28

I'm generally in favor of supporting truffle and jruby. They're both high quality and frequently used. I realize that this adds additional engineering burden but it seems like there's a lot of community support that can help us here.

One interesting question there is evolution of GitHub Actions virtual environments. What if the openssl version changes in one of them? All Ruby versions would need to be recompiled for it, or break. What if there is a new virtual environment, e.g., using a newer Ubuntu release? It will need recompilation too.

This is true - can we automate this? 😁

So I think the better way to address maintenance would be to move this action outside the actions organization, and then the community could actively help with maintenance, update, triage, etc.

To give you some background on why we're opposed to this: there's a chain of trust issue here - our getting started experience (https://github.com/actions/starter-workflows) should have a "build ruby" option. Because of the prevalence of workflows built from these, our current general policy is to require actions in those workflows to reference only things that live in the actions organization. That's because the actions organization has several security policies added to it (again, because of the prevalence).

ethomson avatar Jan 07 '20 21:01 ethomson