noobaa-core
noobaa-core copied to clipboard
Enable support for Memory in HPA view
Environment info
- NooBaa Version: master-20210926
- Platform: OCP 4.8.9
Actual behavior
- Download of large no. of files failed with error
Expected behavior
- Download shouldn't fail
Steps to reproduce
- In order to attempt verification of defect 6707, found this new set of errors.
From app-server, got error:
download failed: s3://bucket-3/obj_112983 to ./obj_112983 An error occurred (504) when calling the GetObject operation: Gateway Timeout
From endpoint logs:
Sep-30 6:09:18.035 [Endpoint/13] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>SlowDown</Code><Message>Reduce your request rate.</Message><Resource>
/bucket-3/obj_112711</Resource><RequestId>ku6jccni-1vil04-1bef</RequestId></Error> GET /bucket-3/obj_112711 {"accept-encoding":"identity","user-agent":"aws-cli/2.1.29 Python/3.8.8 Linux/4.18.0-305.
19.1.el8_4.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.sync","x-amz-date":"20210930T060718Z","x-amz-content-sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","authorizat
ion":"AWS4-HMAC-SHA256 Credential=BrVzxXwyhi470cqA3s0c/20210930/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=e49d840a75a56c9a982db4f536a8e48fdd3997c44b0e
4742a8c23e6de867abfd","host":"s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com","x-forwarded-host":"s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com","x-forwarded-port":"443","x-forwarded-proto":"https","for
warded":"for=10.17.66.193;host=s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com;proto=https","x-forwarded-for":"10.17.66.193"} Error: Semaphore Timeout
at Semaphore._on_timeout (/root/node_modules/noobaa-core/src/util/semaphore.js:233:25)
at Timeout.<anonymous> (/root/node_modules/noobaa-core/src/util/semaphore.js:229:53)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7)
Sep-30 6:09:18.060 [Endpoint/13] [L0] core.endpoint.s3.ops.s3_get_object:: request aborted: undefined
Sep-30 6:09:18.068 [Endpoint/13] [L0] core.sdk.object_sdk:: validate_non_nsfs_bucket: { name: SENSITIVE-7fa8f881394f8a6d, email: SENSITIVE-3d288e97f8a66631, access_keys: [ { access_key: SENSITI
VE-d22de9a58fe9d068, secret_key: SENSITIVE-ebf6e1ecaa99829a } ], has_login: false, has_s3_access: true, allowed_buckets: { full_permission: true }, default_resource: 'nsfs-nsr-1', can_create_bucket
s: true, nsfs_account_config: { gid: 800, uid: 5000, nsfs_only: true, new_buckets_path: '/', backend: 'GPFS', warn_threshold_ms: 100 }, systems: [ { name: 'noobaa', roles: [ 'admin' ] } ], external
_connections: { count: 0, connections: [] }, preferences: { ui_theme: 'DARK' } } { fs_root_path: '/nsfs/nsfs-nsr-1', fs_backend: 'GPFS', id: '6151b1d55fd0290029f00859', name: 'nsfs-nsr-1' }
Sep-30 6:09:18.068 [Endpoint/13] [L1] core.endpoint.s3.s3_rest:: S3 REQUEST GET /bucket-3/obj_113071 op get_object request_id ku6jex9w-2hugkd-14z7 { 'accept-encoding': 'identity', 'user-agent':
'aws-cli/2.1.29 Python/3.8.8 Linux/4.18.0-305.19.1.el8_4.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.sync', 'x-amz-date': '20210930T060918Z', 'x-amz-content-sha256': 'e3b0c44298fc1c149afbf4c8996
fb92427ae41e4649b934ca495991b7852b855', authorization: 'AWS4-HMAC-SHA256 Credential=BrVzxXwyhi470cqA3s0c/20210930/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Sign
ature=e3c108699b6e59bcfa59fe7a0f987bce0fc6093e6e4bd342a9b5e6befac0a90d', host: 's3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com', 'x-forwarded-host': 's3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com', 'x-f
orwarded-port': '443', 'x-forwarded-proto': 'https', forwarded: 'for=10.17.66.193;host=s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com;proto=https', 'x-forwarded-for': '10.17.66.193' }
Sep-30 6:09:18.070 [Endpoint/13] [L1] core.endpoint.s3.ops.s3_get_object:: reading object undefined 512
Sep-30 6:09:18.070 [Endpoint/13] [L0] core.sdk.namespace_fs:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-3/obj_113071', start: 0, end: Infinity }
Sep-30 6:09:18.071 [Endpoint/13] [L1] core.util.buffer_utils:: BufferPool.get_buffer BuffersPool {
buf_size: 8388608,
buffers: [],
sem: Semaphore {
_initial: 536870912,
_value: 0,
_waiting_value: 3347054592,
_wq: WaitQueue {
_name: '_wq_',
_q: <ref *2> LinkedList {
next: '_lln_',
prev: '_llp_',
head: '_llh_',
length: 399,
_lln_: <ref *7> {
count: 8388608,
time: 1632982038135,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: <ref *6> {
count: 8388608,
time: 1632982057848,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: <ref *5> {
count: 8388608,
time: 1632982058049,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: <ref *4> {
count: 8388608,
time: 1632982058158,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: <ref *3> { count: 8388608, time: 1632982058163, _wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } }, _lln_: <ref *1> { count: 8388608, time: 1632982058180, _wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: [Promise] }, _lln_: { count: 8388608, time: 1632982058528, _wq_: [Defer], _lln_: [Object], _llp_: [Circular *1], _llh_: [Circular *2] }, _llp_: [Circular *3], _llh_: [Circular *2] }, _llp_: [Circular *4], _llh_: [Circular *2] },
_llp_: [Circular *5],
_llh_: [Circular *2]
},
_llp_: [Circular *6],
_llh_: [Circular *2]
},
_llp_: [Circular *7],
_llh_: [Circular *2]
},
_llp_: [Circular *2],
_llh_: [Circular *2]
},
_llp_: <ref *8> {
count: 8388608,
time: 1632982157888,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: [Circular *2],
_llp_: <ref *9> {
count: 8388608,
time: 1632982157784,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: [Circular *8],
_llp_: <ref *10> {
count: 8388608,
time: 1632982157778,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: [Circular *9],
_llp_: <ref *11> {
count: 8388608,
time: 1632982157631,
_wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } },
_lln_: [Circular *10],
_llp_: <ref *12> { count: 8388608, time: 1632982157630, _wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: Promise { <pending> } }, _lln_: [Circular *11], _llp_: <ref *13> { count: 8388608, time: 1632982157629, _wq_: Defer { isPending: true, isResolved: false, isRejected: false, _promise_resolve: [Function (anonymous)], _promise_reject: [Function (anonymous)], promise: [Promise] }, _lln_: [Circular *12], _llp_: { count: 8388608, time: 1632982157618, _wq_: [Defer], _lln_: [Circular *13], _llp_: [Object], _llh_: [Circular *2] }, _llh_: [Circular *2] }, _llh_: [Circular *2] },
_llh_: [Circular *2]
},
_llh_: [Circular *2]
},
_llh_: [Circular *2]
},
_llh_: [Circular *2]
},
_llh_: [Circular *2]
}
},
_verbose: false,
_timeout: 120000,
_timeout_error_code: 'IO_STREAM_ITEM_TIMEOUT',
_work_timeout_error_code: 'SEMAPHORE_WORKER_TIMEOUT',
_timer: Timeout { _idleTimeout: 101, _idlePrev: [TimersList], _idleNext: [TimersList], _idleStart: 233845656, _onTimeout: [Function (anonymous)], _timerArgs: undefined, _repeat: null, _destroyed: false, [Symbol(refed)]: true, [Symbol(kHasPrimitive)]: false, [Symbol(asyncId)]: 495467329, [Symbol(triggerId)]: 495467301 }
}
}
Sep-30 6:09:18.136 [Endpoint/13] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>SlowDown</Code><Message>Reduce your request rate.</Message><Resource>/bucket-3/obj_112712</Resource><RequestId>ku6jccqc-3k264q-4eu</RequestId></Error> GET /bucket-3/obj_112712 {"accept-encoding":"identity","user-agent":"aws-cli/2.1.29 Python/3.8.8 Linux/4.18.0-305.19.1.el8_4.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.sync","x-amz-date":"20210930T060718Z","x-amz-content-sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","authorization":"AWS4-HMAC-SHA256 Credential=BrVzxXwyhi470cqA3s0c/20210930/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=42087996203a733867660ef587b9e8a5dbe321f355c3c6bcdde2c0b5d9ba9ac8","host":"s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com","x-forwarded-host":"s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com","x-forwarded-port":"443","x-forwarded-proto":"https","forwarded":"for=10.17.66.193;host=s3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com;proto=https","x-forwarded-for":"10.17.66.193"} Error: Semaphore Timeout
at Semaphore._on_timeout (/root/node_modules/noobaa-core/src/util/semaphore.js:233:25)
at Timeout.<anonymous> (/root/node_modules/noobaa-core/src/util/semaphore.js:229:53)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7)
More information - Screenshots / Logs / Other output
Instead of this failure, shouldn't there be delay introduced for serving this IO request ?
Please see from logs that memory and CPU utilization wasn't high when these errors got triggered.
@akmithal Works as designed, we return the slow-down message to the client and expect it to reduce the rate of requests. This is returned due to a bottleneck on memory utilization for the NSFS flows. The next step would be to add memory utilization to the HPA and fine-tune the amount of memory that we allocate for IO in the NSFS flows.
Hi @jeniawhite , are we going to trim down the error logs for this error. If yes, you can use this bug to track this work. If it is going to be like this only / PR already open, will close this bug then.
Hi @nimrod-becker , Is there a PR/workitem created to track this work "add memory utilization to HPA"? Else, we can use this defect for it. FYI @romayalon
We can keep this issue until we have a task tracking this, basically, we wanted to go with something that eventually did not make it to openshift so we need to wait until the new way is available.
@romayalon , could you link the JIRA ticket here for reference ?
No Jira ticket, this needs to be scheduled for a certain release