pkgr icon indicating copy to clipboard operation
pkgr copied to clipboard

pkgr must more aggressively error on failed download

Open dpastoor opened this issue 6 years ago • 10 comments

when downloading packages, receiving a 404 or other issue where cannot download should not just print a warning and proceed, as the installation will obviously fail since that package does not exist

1388:{"level":"warning","msg":"bad server response","package":"cmprsk","status":"404 Not Found","status_code":404,"time":"2019-09-30T08:42:15-04:00","url":"https://metrumresear
chgroup.github.io/cran/2019-09-22/src/contrib/cmprsk_2.2-7.1.tar.gz"}
ERRO[0509] installation failed for packages: cmprsk
TRAC[0509] Resetting package environment
INFO[0510] duration:8m30.328218453s
FATA[0510] failed package install with err, failed installation for packages: cmprsk

dpastoor avatar Sep 30 '19 13:09 dpastoor

Can you possibly add a Retry here:

https://github.com/metrumresearchgroup/pkgr/blob/develop/cran/download-package.go#L128

I added a log to record the reason for download fail

log.WithField("package", d.Package.Package).Warn(err)

And the failure I am seeing is:

INFO[0087] downloading package                           package=nycflights13
WARN[0087] http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""  package=reticulate
WARN[0087] http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""  package=caret
WARN[0087] downloading failed                            package=reticulate
WARN[0087] downloading failed                            package=caret
WARN[0087] http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""  package=gt
WARN[0087] downloading failed                            package=gt
WARN[0087] http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""  package=tsibble
WARN[0087] downloading failed                            package=tsibble

Wondering if perhaps, the server thinks its a DOS attack?

bschulth avatar Oct 13 '21 01:10 bschulth

Thanks for the report Brian - what type of repo are you pulling from - gitlab pages? rstudio package manager? MRAN, other? There is pretty high concurrency of these requests.

This is on the queue to just redo anyway - like pkgr shouldn't even just warn, it should straight error since if you're failing to dl obv the install is going to go away.

What would be helpful is what could we add to also make this more robust - both some retr(ies) on the http end + potentially some concurrent_dl setting

dpastoor avatar Oct 13 '21 01:10 dpastoor

INFO[0002] package installation sources
AmgenInternal=25 BioCann=1 BioCsoft=17 CRAN=994 ========> (https://cran.microsoft.com/snapshot/2021-10-08) CRAN_20190901=1 CRAN_20200118=4 CRAN_20200510=2 H2O=1 INLA=1 glmmADMB_repo=1 tarballs=0

bschulth avatar Oct 13 '21 01:10 bschulth

ok thanks - that CRAN server is.... not great... has had a number of outages. pkgr needs improvement here but in the meantime - some knowledge to drop - switch away from mran and instead rstudio has cran snapshotted every night now:

image

i've switched over and its been much better

dpastoor avatar Oct 13 '21 01:10 dpastoor

url: https://packagemanager.rstudio.com/client/#/repos/1/overview

dpastoor avatar Oct 13 '21 01:10 dpastoor

Ah, cool, let me give that a try. Yeah my build have been failing every for the past few days. I was about to go with a hack in your code here and put a 1 second sleep in place so that I don't spam that server. https://github.com/metrumresearchgroup/pkgr/blob/develop/cran/download-package.go#L124 It definitely seems like it thinks I am DOS'ing it. The one second delay on each round let it succeed.

I'll let you know how the RStudio snapshot works....

bschulth avatar Oct 13 '21 02:10 bschulth

package manager is much slower (like it has 1-2 second built-in throttle), but my first attempt passed, so seems like a good work-around. Thanks!

I think MRAN, would be fine, but needs a download throttle....so maybe a workaround is to introduce a new property in the yml file for a per-repo download throttle in seconds. I think this is an issue only because I am up to 999 packages from there.

bschulth avatar Oct 13 '21 02:10 bschulth

kk give #389 a try - that has an exponential backoff built in (thanks hashicorp) and the concurrency control knob - perhaps

PKGR_DL_CONCURRENCY=3 pkgr install ...

combined with the retries just in case will do the trick. Interesting that package manager was slower for you, we've seen that speed things up for us compared to MRAN - though to be fair, 99.99% of the time we point to mpn.metworx.com these days :-) (though i'm not sure if your entire snapshot is present in MPN)

dpastoor avatar Oct 13 '21 02:10 dpastoor

No errors during the download phase using #389 on the MRAN link, so that that seems like a positive!

Regarding timing, RStudio Package Manager History Repo: 6m.47s to download all MRAN, 3m29s to download all.

PKGR_DL_CONCURRENCY=2 works as well, sufficiently throttling the concurrent downloads so that the last few packages don't start failing and hit the download retry. (Though at 2 threads, it's as slow as RS Package Manager.....6m27).

So 2 positive changes.

bschulth avatar Oct 13 '21 03:10 bschulth

Poking this topic.

  • Just an additional note, when packages passively fail to download upstream,
  • installs seem to passively fail downstream (e.g. process does not exit with a non-zero exit code so that docker builds fail)
 time="2022-02-03T04:35:15-08:00" level=info msg="Successfully Installed." package=libcoin remaining=660 repo=CRAN version=1.0-9
 time="2022-02-03T04:35:16-08:00" level=error msg="cmd output" exit_code=1 output="Warning: invalid package ‘/opt/local/docker/installers/runtime/pkgr’\nError: ERROR: no packages specified\n" package=coda stderr="Warning: invalid package ‘/opt/local/docker/installers/runtime/pkgr’\nError: ERROR: no packages specified\n" stdout=
 time="2022-02-03T04:35:16-08:00" level=warning msg="error installing" err="exit status 1"
 time="2022-02-03T04:35:19-08:00" level=info msg="Successfully Installed." package=mc2d remaining=659 repo=CRAN version=0.1-21

Also, it seems that the final failure does not exit with a non-zero exit code:

 time="2022-02-03T04:38:25-08:00" level=error msg="did not install IRdisplay"
 time="2022-02-03T04:38:25-08:00" level=error msg="did not install distributional"
 time="2022-02-03T04:38:25-08:00" level=error msg="did not install refund"
 time="2022-02-03T04:38:25-08:00" level=error msg="installation failed for packages: ucminf, proto, svUnit, wavelets, entropy, coda, clue"
 time="2022-02-03T04:38:25-08:00" level=info msg="starting individual tarball install"
 time="2022-02-03T04:38:25-08:00" level=info msg="total package install time" duration=37m12.389199139s
 time="2022-02-03T04:38:26-08:00" level=info msg="duration:37m13.64722306s"
 time="2022-02-03T04:38:26-08:00" level=error msg="failed package install with err, failed installation for packages: ucminf, proto, svUnit, wavelets, entropy, coda, clue"

So it would be mainly nice to get a non-zero exit code so that automated builds fail properly

bschulth avatar Feb 03 '22 16:02 bschulth