bosh-lite icon indicating copy to clipboard operation
bosh-lite copied to clipboard

Response exceeded maximum allowed length

Open styeung opened this issue 10 years ago • 8 comments

Hi,

We tried deploying a trusty branch of CF-Release (https://github.com/cloudfoundry/cf-release/tree/trusty64-rootfs) and got the following error:

Started updating job runner_z1 > runner_z1/0. Failed: Response exceeded maximum allowed length (00:00:39)
Error 450001: Response exceeded maximum allowed length

The full error log can be found here.

We've been able to successfully deploy before, and this is the first time we've seen this error message. What's causing this?

Thanks,

Sai To

styeung avatar Feb 24 '15 21:02 styeung

Here is our output from bosh task 3 --debug

styeung avatar Feb 24 '15 21:02 styeung

That's probably the stdout/stderr of the failure being larger than the size NATS allows per message. At some point in the past we added some code to catch that on the agent side and only send the last 100 lines of the message, but it's possible that the output had some really long lines (like megabytes in size). I thought we had a guard around that, but maybe not. I'm also not sure when the last time bosh-lite's stemcell was updated with the latest agent, but I assume it has this feature since it was added several month ago.

It could also be some other message response being too long...

@cppforlife got any other ideas?

On Tue, Feb 24, 2015 at 1:55 PM, Sai To Yeung [email protected] wrote:

Here https://gist.github.com/styeung/4e8b4d17057e4817e8df is our output from bosh task 3 --debug

— Reply to this email directly or view it on GitHub https://github.com/cloudfoundry/bosh-lite/issues/239#issuecomment-75854915 .

bosh-ci-push-pull avatar Feb 25 '15 02:02 bosh-ci-push-pull

With that release, on this particular box, we are able to reproduce this error.

Our next steps are to remove .blobs and .bosh/cache and try again.

jtarchie avatar Feb 25 '15 16:02 jtarchie

You said bosh cli plugin? I'm a dummy.

On Feb 25, 2015, at 8:46 AM, JT Archie [email protected] wrote:

With that release, on this particular box, we are able to reproduce this error.

Our next steps are to remove .blobs and .bosh/cache and try again.

— Reply to this email directly or view it on GitHub https://github.com/cloudfoundry/bosh-lite/issues/239#issuecomment-75997227 .

bosh-ci-push-pull avatar Feb 25 '15 17:02 bosh-ci-push-pull

This is definitely stdout/stderr going over the limit due to how we use tar (verbose mode) in the Agent. Real problem here is that it fails to untar. This could be either due to invalid package cache or for some reason compilation stage did not successfully tar up the package. Since this is bosh-lite best way to go about it is to blow away that deployment and cf-release from the Director.

cppforlife avatar Feb 25 '15 18:02 cppforlife

We'll adjust bosh-agent eventually to no log everything from tar command.

cppforlife avatar Feb 25 '15 18:02 cppforlife

We are able to reproduce this error again. The strange part of it is that we can produce it on our CI machine, but unable to reproduce it on our dev machine, where the deployment of Bosh Lite and CF worked perfectly.

jtarchie avatar Feb 25 '15 19:02 jtarchie

We were able to reproduce this bug by running this errand:

#!/bin/bash
#

for (( i = 0; i < 1024 * 1024 * 2; i++ )); do
    echo "Hello!"
done

This was on a bosh-init deployed vSphere director, so not sure this is necessarily a bosh-lite problem.

It looks like the nats handler does not publish any of the message if it gets an error from the PerformHandler: https://github.com/cloudfoundry/bosh-agent/blob/fcb52b4f1aeae2c0c48e76c374b6f80354cbece5/mbus/nats_handler.go#L161-L164

benmoss avatar Feb 25 '16 14:02 benmoss