libnfs icon indicating copy to clipboard operation
libnfs copied to clipboard

Slow Lookup and GETATTR Performance on Windows

Open jonathanknorr opened this issue 4 years ago • 1 comments

We are seeing very slow LOOKUP and GETATTR performance, we are using libnfs SYNC for data integrity reasons and 128 threads. Writes are sub-1ms but lookups are upwards of 500ms which is pretty abhorrent considering the compute and storage environments (see below). Any suggestions on how to most optimally configure libnfs on Windows for high bandwidth/low latency? That is our end goal, we are very latency sensitive, ASYNC seems obvious but we have integrity concerns (maybe unfounded?).

Environment: Proprietary low-latency rendering engine, the platform is Windows 2019 and historically run on local SSD/NTFS which does not scale well. We are using an ultra-fast NetApp flash array and have been trying to move from SMB3 to NFS via libnfs versus leveraging the awful NFS capability in Windows Server. It is too much work to refactor to Linux at the moment, this may come at a later date.

Scale:

20TiB today with hundreds of directories and 10's of millions of small files. Our application runs on a single (very powerful) Dell host today with 512-768GB of RAM and fairly endless CPU. Plan is to scale to > 200TiB over time.

jonathanknorr avatar Oct 14 '21 18:10 jonathanknorr

On Fri, Oct 15, 2021 at 4:27 AM jonathanknorr @.***> wrote:

We are seeing very slow LOOKUP and GETATTR performance, we are using libnfs SYNC for data integrity reasons and 128 threads. Writes are sub-1ms but lookups are upwards of 500ms which is pretty abhorrent considering the compute and storage environments (see below). Any suggestions on how to most optimally configure libnfs on Windows for high bandwidth/low latency? That is our end goal, we are very latency sensitive, ASYNC seems obvious but we have integrity concerns (maybe unfounded?).

I think there is a misunderstanding. There is no difference in data integrity between the SYNC and the ASYNC models and both provides exactly the same integrity guarantees. (there is no writeback caching in libnfs) I.e. SYNC and ASYNC does not refer to what the mount options sync/async refer to but rather the programming model of the API.

The SYNC interface refers to the API where you make a "function call" which will block until it has fully completed while the ASYNC interface refers to an event-driven NON-blocking design where you issue a function call that returns immediately and where you are informed at a later stage when the call has completed via a callback.

So the integrity concerns are unfounded. I realize more and more that I should have picked different names to distinguish between the two models. Maybe I should have called them BLOCKING and EVENT_DRIVEN instead.

In general, ASYNC/EVENT-DRIVEN APIs are often more performant and has been the main focus of mine until recently and why the SYNC interface was mostly an afterthought for people that did not want to use an event driven design. One of the gains in an async design is that you can do concurrent requests without doing multithreading which means that in such designs you do not need to take into account or use synchronization mechanisms like mutexes or semaphores. This reduces overhead. I have users that have been doing >>100.000 concurrent iops from a single thread, which was the reason why libnfs hashes all outstanding requests into 1024 different queues (it takes a lot of CPU to match a reply to the request if you have to scan a >>100000 item long list) lol.

IF you are comfortable with using event-driven designs then I would suggest using the async interface. Event driven design is a lot lot harder and inconvenient to use but you often get better performance as you basically have unlimited amount of concurrency. While in a multithreaded design your concurrency is often bounded by the number of threads you can have.

The async interface is NOT thread safe and meant to be used from a single thread. Or use one nfs context for each thread.

Optionally: Until very recently, the sync interface was NOT thread safe either, and you would need to have one separate nfs context for each thread. New feature in the master branch adds multithreading support via pthread (which is available also on windows, or see README.multithreading if you can/want to add support for native windows threading primitives, patches welcome as I docus on linux) With this very new feature you can now build libnfs with multithreading support and use a single nfs context from all your threads. This only supports the SYNC inteface as of now, and concurrency is still limited to 1 iops per thread.

I.e. 1, use the async interface, it provides unlimited concurrency which means you can keep any network link fully saturated regardless of latency. 2, use the sync interface with the new multithreading support

Environment:

Proprietary low-latency rendering engine, the platform is Windows 2019 and historically run on local SSD/NTFS which does not scale well. We are using an ultra-fast NetApp flash array and have been trying to move from SMB3 to NFS via libnfs versus leveraging the awful NFS capability in Windows Server. It is too much work to refactor to Linux at the moment, this may come at a later date.

Scale:

20TiB today with hundreds of directories and 10's of millions of small files. Our application runs on a single (very powerful) Dell host today with 512-768GB of RAM and fairly endless CPU. Plan is to scale to > 200TiB over time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sahlberg/libnfs/issues/367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADY3EFEATC53UKWPIREQPLUG4OKHANCNFSM5GAJCCPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sahlberg avatar Oct 15 '21 02:10 sahlberg