helpdesk
helpdesk copied to clipboard
[INFRA-2754] Realign repo.jenkins-ci.org mission
More context can be found here
Folks at Jfrog are investigating how to reduce repo.jenkins-ci.org costs.
They are still interested to sponsor us but want to be sure that the repository is only used for Jenkins stuff and not as a proxy cache for other purposes.
Originally reported by olblak, imported from: Realign repo.jenkins-ci.org mission
- assignee: danielbeck
- status: In Progress
- priority: Minor
- resolution: Unresolved
- imported: 2022/01/10
Regarding bandwidth usage, INFRA-2772 was a pretty straightforward discovery.
Storage is more difficult. Right now my plan for the first step is to remove any artifacts that haven't been accessed in a while, exist in the upstream repo, and have the same checksum upstream, starting with the largest ones.
[Originally related to: INFRA-2772]
[Originally related to: INFRA-2812]
Issues in this epic:
- https://github.com/jenkins-infra/helpdesk/issues/2340
- https://github.com/jenkins-infra/helpdesk/issues/2377
- https://github.com/jenkins-infra/helpdesk/issues/2385
- https://github.com/jenkins-infra/helpdesk/issues/2386
Updating this issue after we had recent outages on JFrog (#2864 #2949 and eventually #2904).
Problem: we have beetween 20 to 30 % of requests on repo.jenkins-ci.org that are HTTP/404, which is causing bandwitdh and performance issues as per JFrog's message in https://groups.google.com/g/jenkins-infra/c/ZdyYIhlNJQY/m/QCdT5OZIAAAJ .
We might want to check #2385 as a first step. Ping @daniel-beck we need your help as we don't know how to identify (and if we can without the help of JFrog) these "rogue" requests.
These are unrelated.
We proxy repo1, and #2385 is about artifacts proxied from there but not used for a Jenkins purpose. Some folks probably just point their Maven at our Artifactory and do some ML bullshit, for which artifacts often exceed 1GB.
404 is when they set up Maven to query us for artifacts and they don't exist in our repos, or Maven repo1, typically internal private-source stuff. There's tons of log spam related to this, so we know the paths, but since access is anonymous, we don't know who does that, so we cannot tell them to knock it off (we could infer some by artifact path, but 🤷 ). Since we do not have a reverse proxy, we also cannot patch the responses and serve them "please go away" responses.
Knowledge sharing from #3101 :
- https://github.com/jenkins-infra/helpdesk/issues/3101#issuecomment-1220826616
- https://github.com/jenkins-infra/helpdesk/issues/3101#issuecomment-1229883900
Summary of the recent meeting with JFrog:
- We'll be able to get more metrics (access, size, bandwidth, etc.) once repo.jenkins-ci.org will be migrated to their new platform. They'll contact us for a proposal timeline (expecting ~1 hour outage).
- repo.jenkins-ci.org is consuming 40 to 50 Tb of data per month. We have to stay under the 10 Tb per month limit. That will be a topic for upcoming 2022 Contributor summit and should result in a JEP: https://github.com/jenkinsci/jep/pull/393
We've filed a public abuse report for the IP address 39.107.36.205 . Attempts to stop that abuse through private channels have failed. We'll continuing reporting that abuse to the public location until the abuse stops or we find ways to block the IP address.
can we just block it at artifactory / get jfrog to block it?
can we just block it at artifactory / get jfrog to block it?
Nope, we cannot on our own as it is a JFrog-managed platform. Jfrog is studying this.
The real question is: will it be sufficient (I mean: the abuser(s)s could switch IP and start again).
JFrog is investigating and hopes to be able to block the IP address. We'll certainly keep people informed as we learn more from them.
As far as I can tell, this issue is resolved. The Jenkins artifact repository no longer caches Maven central. The artifact caching proxy provides artifact caching for agents connected to ci.jenkins.io