anax icon indicating copy to clipboard operation
anax copied to clipboard

agent-install : Check for local copy before pulling contents from GitHub - prevent over the limit error

Open playground opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe.

Yes, I have encountered this error from time to time during installation.

sudo curl -sSL https://github.com/open-horizon/anax/releases/latest/download/agent-install.sh | sudo -s -E bash -s -- -i anax: -a jeff-work-mbp:some-device-token -k css: -c css: command failed: Error: Command failed: sudo touch /etc/default/horizon && sudo curl -sSL https://github.com/open-horizon/anax/releases/latest/download/agent-install.sh | sudo -s -E bash -s -- -i anax: -a jeff-work-mbp:some-device-token -k css: -c css: bash: line 1: syntax error near unexpected token <' bash: line 1: <?xml version="1.0" encoding="utf-8"?><Error><Code>ServerBusy</Code><Message>Egress is over the account limit.'

Describe the solution you'd like.

Ideally, the install script should pull from a locally cached copy if one already exists.

Describe alternatives you've considered

No response

Additional context.

No response

playground avatar Oct 14 '22 15:10 playground

@johnwalicki and @dabooz This is confirmed and appears to be an intermittent error state whenever Azure Object Storage thinks that GitHub packages are being pulled from a repo at a faster-than-anticipated rate. This is a problem for us if someone is using our agent-install.sh script to install on, say, 100 nodes in a short time period. The more we get used, the more likely someone is to encounter it. How do you recommend that we proceed? Is the solution to anticipate the potential error condition and take alternative action? Or just notify the user that the server is busy and to try again later? Or to be more efficient and to put less load on GitHub packages? Or some other approach?

joewxboy avatar Oct 14 '22 15:10 joewxboy

It would not be uncommon to see 100s of nodes running the agent-install.sh in short time periods. I think we should move away from dependencies on GitHub / Docker Hub. It potentially throttles the scalability and is outside our control. We might look to split how Open Horizon and the downstream IBM product deliver the binaries used by agent-install.sh

johnwalicki avatar Oct 14 '22 18:10 johnwalicki

Is the Egress is over the account limit. error caused by the number of transactions to github/anax/releases or the bandwidth consumption of the download binaries? We would not easily be able to limit the total number or transactions, but agent-install.sh could be more careful about bandwidth consumption.

  • The build process would need to generate a md5sum / checksum for each anax/release gz binary
  • agent-install.sh would check if there is an existing horizon-agent-linux-<distro>-<arch>.tar.gz
  • if yes, agent-install.sh would get the checksum file from GitHub
  • Compare the sizes, if the size is the same, use the local copy
  • If the size is different, get the remote copy

That would be a stop-gap. Ideal solution would use a CDN that we control and pay for. Another solution would forgo the checksum and just catch the "server is busy", sleep, and try again.

johnwalicki avatar Oct 16 '22 23:10 johnwalicki

As an owner, I am able to look at the GitHub usage for the open-horizon organization. https://github.com/organizations/open-horizon/settings/billing Usage is essentially 0.

Azure Egress definitions at https://learn.microsoft.com/en-us/azure/storage/common/scalability-targets-standard-account

Makes me suspect that the egress limit is some other GitHub oddity. A search found these conversations:

  • https://github.com/community/community/discussions/8535
  • https://github.com/lovell/sharp-libvips/issues/121

johnwalicki avatar Oct 16 '22 23:10 johnwalicki

I think we might be overthinking / over-engineering a solution. Just try the download again.

johnwalicki avatar Oct 16 '22 23:10 johnwalicki

When you're using someone else's cloud, you can't really complain.

johnwalicki avatar Oct 17 '22 00:10 johnwalicki

I'm clearing the bug label because I don't think its an agent-install bug. At best, we can handle the 503 more elegantly - but as a Feature Request.

johnwalicki avatar Oct 17 '22 00:10 johnwalicki

I will close it because it's not an agent-install issue.

linggao avatar Feb 27 '23 15:02 linggao