noobaa-core icon indicating copy to clipboard operation
noobaa-core copied to clipboard

Small object write performance very slow

Open davidbw246 opened this issue 3 years ago • 14 comments

Environment info

POK baremetal cluster(c83f1-infa)

NooBaa Version: INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: noobaa/noobaa-core:master-20220112 INFO[0000] operator-image: noobaa/noobaa-operator:master-20220112 INFO[0000] noobaa-db-image: centos/postgresql-12-centos7 INFO[0000] Namespace: noobaa

OC version Client Version: 4.9.0 Server Version: 4.9.0 Kubernetes Version: v1.22.0-rc.0+894a78b

Actual behavior

  1. Writes with small objects(64KB, 4MB) limited to max of 120 writes/sec.

Expected behavior

  1. Write performance should be much higher with small objects where the bandwidth is not an issue.

Steps to reproduce

  1. Run cosbench with small object writes with multiple workers.

More information - Screenshots / Logs / Other output

must-gather.tar.gz

davidbw246 avatar Jan 24 '22 14:01 davidbw246

@nimrod-becker please add this issue to https://ibm.ent.box.com/notes/806750308079

davidbw246 avatar Jan 24 '22 14:01 davidbw246

Hi @davidbw246, I saw this message in the endpoint logs: 2022-01-24 13:56:57.560361 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING Rename _old_path=/nsfs/fs1/cb6/.noobaa-nsfs_61b11a909954d0002a5d14fd/uploads/4a0c95ec-30ad-4161-8fa6-150f800e851e _new_path=/nsfs/fs1/cb6/586_64KB took too long: 881.188 ms

When you try to measure the time of renaming a small file directly on your file system, do you see these times too? seems like the file system action itself took a long time (when I’m saying renaming I refer to moving a small file between directories)

besides that, I see many many Postgres error messages which might also slow down NooBaa, I saw also some RPC timeouts which probably were caused because of the Postgres issues.

romayalon avatar Feb 01 '22 12:02 romayalon

I did some tests on an endpoint where I time moving 64KB files between directories that back my buckets in /nsfs/fs1. The time to move a file was consistently 4ms whether I did 1 or 2 moves simultaneously.

The 4ms per move is about half of the average time to do a 64KB put(8ms) using cosbench.

8ms for a put is ok, the problem is when you do 2 puts simultaneously they average 16ms. The system should be able to do 2 puts in 8ms.

example of command I ran on the endpoint: time for i in {1..1000}; do mv cb1/tmp64KB cb2; mv cb2/tmp64KB cb1; done

davidbw246 avatar Feb 01 '22 15:02 davidbw246

Can we verify if the 16ms average is on the endpoint side or the FS side? similar to what you did with the mv, can we run it and time it from within the pod on the PVC bound to it?

nimrod-becker avatar Feb 01 '22 15:02 nimrod-becker

It seems we would need to change the way we work with small files. We have an item on the roadmap to integrate with the SpectrumScale lib to increase performance for multipart uplaods. Might be beneficial to see if this also can be done for small objects, but we would need the info on what API to invoke

nimrod-becker avatar Feb 08 '22 10:02 nimrod-becker

If we are OK with 120 puts/sec for the first release, I am OK with trying to improve this in the next release.

davidbw246 avatar Feb 08 '22 14:02 davidbw246

Hi @davidbw246 , Since action is on IBM development, we should still keep it open and ensure there is PR created to fix it. Directly closing it might result in we loosing the track.

@nimrod-becker , what do you think ? Can we track it at your end somehow as well.

akmithal avatar Feb 10 '22 09:02 akmithal

Reopening due to the request from @akmithal above.

davidbw246 avatar Feb 10 '22 13:02 davidbw246

As a roadmap item with the other items we have, not as a bug I think @akmithal

nimrod-becker avatar Feb 10 '22 13:02 nimrod-becker

Was there a PR created to fix this as @akmithal recommended above?

davidbw246 avatar Mar 08 '22 12:03 davidbw246

There was no PR created, this is an item for future versions and it needs to be prioritized. @akmithal @davidbw246 please add to the list of items for next versions and close this issue

nimrod-becker avatar Mar 09 '22 16:03 nimrod-becker

Has this been added to the list of items for next versions? Where would I find that list?

davidbw246 avatar Apr 07 '22 12:04 davidbw246

@nimrod-becker , what is the estimate on this availability

rkomandu avatar Sep 06 '22 06:09 rkomandu

Marc's PR is already, in . Its also dependent on the availability of libgpfs on the FS.

In any case, I am aware of performance testing being run

nimrod-becker avatar Sep 06 '22 08:09 nimrod-becker