noobaa-core
noobaa-core copied to clipboard
Small object write performance very slow
Environment info
POK baremetal cluster(c83f1-infa)
NooBaa Version: INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: noobaa/noobaa-core:master-20220112 INFO[0000] operator-image: noobaa/noobaa-operator:master-20220112 INFO[0000] noobaa-db-image: centos/postgresql-12-centos7 INFO[0000] Namespace: noobaa
OC version Client Version: 4.9.0 Server Version: 4.9.0 Kubernetes Version: v1.22.0-rc.0+894a78b
Actual behavior
- Writes with small objects(64KB, 4MB) limited to max of 120 writes/sec.
Expected behavior
- Write performance should be much higher with small objects where the bandwidth is not an issue.
Steps to reproduce
- Run cosbench with small object writes with multiple workers.
More information - Screenshots / Logs / Other output
@nimrod-becker please add this issue to https://ibm.ent.box.com/notes/806750308079
Hi @davidbw246,
I saw this message in the endpoint logs:
2022-01-24 13:56:57.560361 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING Rename _old_path=/nsfs/fs1/cb6/.noobaa-nsfs_61b11a909954d0002a5d14fd/uploads/4a0c95ec-30ad-4161-8fa6-150f800e851e _new_path=/nsfs/fs1/cb6/586_64KB took too long: 881.188 ms
When you try to measure the time of renaming a small file directly on your file system, do you see these times too? seems like the file system action itself took a long time (when I’m saying renaming I refer to moving a small file between directories)
besides that, I see many many Postgres error messages which might also slow down NooBaa, I saw also some RPC timeouts which probably were caused because of the Postgres issues.
I did some tests on an endpoint where I time moving 64KB files between directories that back my buckets in /nsfs/fs1. The time to move a file was consistently 4ms whether I did 1 or 2 moves simultaneously.
The 4ms per move is about half of the average time to do a 64KB put(8ms) using cosbench.
8ms for a put is ok, the problem is when you do 2 puts simultaneously they average 16ms. The system should be able to do 2 puts in 8ms.
example of command I ran on the endpoint: time for i in {1..1000}; do mv cb1/tmp64KB cb2; mv cb2/tmp64KB cb1; done
Can we verify if the 16ms average is on the endpoint side or the FS side? similar to what you did with the mv, can we run it and time it from within the pod on the PVC bound to it?
It seems we would need to change the way we work with small files. We have an item on the roadmap to integrate with the SpectrumScale lib to increase performance for multipart uplaods. Might be beneficial to see if this also can be done for small objects, but we would need the info on what API to invoke
If we are OK with 120 puts/sec for the first release, I am OK with trying to improve this in the next release.
Hi @davidbw246 , Since action is on IBM development, we should still keep it open and ensure there is PR created to fix it. Directly closing it might result in we loosing the track.
@nimrod-becker , what do you think ? Can we track it at your end somehow as well.
Reopening due to the request from @akmithal above.
As a roadmap item with the other items we have, not as a bug I think @akmithal
Was there a PR created to fix this as @akmithal recommended above?
There was no PR created, this is an item for future versions and it needs to be prioritized. @akmithal @davidbw246 please add to the list of items for next versions and close this issue
Has this been added to the list of items for next versions? Where would I find that list?
@nimrod-becker , what is the estimate on this availability
Marc's PR is already, in . Its also dependent on the availability of libgpfs on the FS.
In any case, I am aware of performance testing being run