LibuvSharp icon indicating copy to clipboard operation
LibuvSharp copied to clipboard

What is the status of LibuvSharp?

Open robertmircea opened this issue 11 years ago • 19 comments

I am interested in using this library for developing a high performance socket server on Linux. I've already tried using async sockets under mono but their performance is abysmal compared on how the same IOCP socket server runs under Windows with .NET. While mono uses epoll under Linux when available, the actual implementation of async socket operations (SendAsync, ReceiveAsync, etc) is using a threadpool making them quite unusable in high performance scenarios.

I checked the tcp tests source code regarding the possible operations, but I do not have sufficient information to take an informed decision. I am interested mainly in the following topics:

  • listening on sockets,
  • accepting/sending/receiving data from clients
  • detecting clients' connections interruptions due to many various reasons (clients disconnecting gracefully, cut connections by firewall, network interruptions, etc)
  • closing connections from server side
  • lack of memory leaks

What is the status of this library regarding sockets support? Was it used until now in production applications?

robertmircea avatar Jan 01 '13 22:01 robertmircea

I am currently working on the documentation and polishing of the API for a release.

  • Listening to sockets is fully supported.
  • accepting/sending/receiving data is fully supported.
  • This one is harder to answer/test, but there is an Error event which will be casted if there is some failure when receiving, the other operations have a special argument for exceptions if it failed.
  • closing connections from server side is fully supported(it is called Shutdown and will ensure that all the data passed to write bwill be send, or just call a Close if you don't want to send a TCP end packet).
  • lack of memory leaks - I think I handled all the cases

The library supports TCP and UDP. I am using it to manage a 20 udp sockets 24/7 and one tcp connection to the mysql database. ki9a is using it for logging. It is not well tested compared to other libs.

As I said, I am currently polishing the API and replacing some bits of the code to make it more 'high performance'. All in all, this lib does everything what libuv does (or node.js), it just haven't been benchmarked too much, so there might be some performance hot spots, but it looks like it performs quite well.

One more thing: I am actively developing this lib, so if you have a problem I will react immediately to all your requests and suggestions. To be blunt: there are 13 people watching it on github and only me working on it. I am actively chatting with the libuv guys. I try my best, but if you want a library which you will be able just to use ... well this library is not yet there. Not tested too thoroughly, not benchmarked for optimizations and most important, not too much documentation (which will change in a few days).

txdv avatar Jan 02 '13 01:01 txdv

Thanks for comments. You've actually won my attention before commenting :) because this lib seems the most compelling libuv wrapper available in .NET. Meanwhile, I took the lib for a spin and I tried running the provided examples and tests. Compiling libuv + LibuvSharp on Linux and Mac against the latest version of Mono was far more easier than compiling libuv on Windows. The challenge was actually producing the dll version of libuv from the sources. Anyway...

I've discovered two issues (running on Windows, compiled with standard .NET c# compiler):

  • the examples did not compile first time because of a compilation problem in Async.cs (in method ReadStringAsync the line 52 should be encoding.GetString(buffer.Value.Array, buffer.Value.Offset, buffer.Value.Count);)
  • I've modified void Stress(IPEndPoint ep) method in TcpFixture.cs by increasing the number of times it loops (300) and when running the test afterwards, I've received the following error:

System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that >other memory is corrupt. at LibuvSharp.Loop.uv_run(IntPtr loop) at LibuvSharp.Tests.TcpFixture.Stress(IPEndPoint ep) at LibuvSharp.Tests.TcpFixture.Stress()

robertmircea avatar Jan 02 '13 10:01 robertmircea

Thanks for comments. You've actually won my attention before commenting :) because this lib seems the most compelling libuv wrapper available in .NET.

Yes, as you can see, it is not a one week project - I have been sitting around freenodes #libuv and talking to the main devs for quite a while now.

  • the examples did not compile first time because of a compilation problem in Async.cs (in method ReadStringAsync the line 52 should be encoding.GetString(buffer.Value.Array, buffer.Value.Offset, buffer.Value.Count);)

Well, the Async examples are just there for fun, I will try to work out a better Async API in the future, but in the first release I want to focus on the basics. Furthermore, I will put all examples into the a seperate book repo, because it get's quite tedious to manage all this in one repo.

The second issue is more concerning. What windows version are you using and what libuv version have you compiled? How have you compiled it?

txdv avatar Jan 02 '13 15:01 txdv

Regarding the crash: I am using Windows 8 and I've compiled libuv from github (latest) with VS 2012. For creating the dll from lib, I've used the following command lines:

  • for 32bit version: cl /o uv.dll obj\libuv\*.obj ws2_32.lib advapi32.lib psapi.lib iphlpapi.lib /link /DLL /DEF:uv.def
  • for 64bit version: "P:\Microsoft Visual Studio 11.0\VC\bin\x86_amd64\cl" /o uv.dll D:\Dev\LibuvSharp\libuv\x64\Release\*.obj ws2_32.lib advapi32.lib psapi.lib iphlpapi.lib /link /DLL /DEF:D:\Dev\LibuvSharp\libuv\Release\uv.def

I can send you the uv.def file which I've generated with name of the functions from externals in your sourcecode:

Sample content from it:

LIBRARY UVWRAP EXPORTS uv_timer_get_repeat @101 uv_default_loop @102 uv_run_once @103 uv_unref @104 uv_pipe_init @105 uv_udp_recv_start @130 uv_udp_send @131 uv_udp_send6 @132

robertmircea avatar Jan 02 '13 22:01 robertmircea

I've integrated LibuvSharp in my "tcp server" project and I've gave it a try. Unfortunately, the following behavior was observed (on Windows, for now):

When running the project in debug mode in IDE, after processing requests for awhile, the IDE pops with the following exception for the line which contains Loop.Default.Run();

A callback was made on a garbage collected delegate of type 'LibuvSharp!LibuvSharp.Handle+callback::Invoke'. This may cause application crashes, corruption and data loss. When passing delegates to unmanaged code, they must be kept alive by the managed application until it is guaranteed that they will never be called.

When running in release mode, the server stops processing requests after a while (about after the same period as in debug mode) without any exception. I assume the problem is the same but there is no visible alert due to "Release" mode optimizations.

Do you have any idea how can I keep the event loop alive?

Loop

robertmircea avatar Jan 03 '13 23:01 robertmircea

What libuv methods are you using? All methods that are using Handle.callback in their invocation paths are of interests.

txdv avatar Jan 04 '13 10:01 txdv

It seems like you are using TcpListener and the Listen method? Are you making sure that the TcpListener instance doesn't get collected?

txdv avatar Jan 04 '13 11:01 txdv

Hi Robert. It seems to me you are somehow doing it wrong. You need to consider that you are using a native library (even if wrapped into a nice managed api). Without your code is difficult to spot the problem. But the exception is quite clear: the object where you have your callback for some libuv event (ex. read, write, close, etc..) has been garbage collected. This is due to the fact that there were no alive references to that object. You should have a reference to the object somewhere until the callback is supposed to be used. Or consider using static callback methods. Also a stack trace would help more than the image you attached.

A a side note I'm going to post soon a patch to libuv to support Visual Studio 2012. Meanwhile if you install Visual Studio 2010 (the express is free) you can build the libuv dll from the project root with this command: vcbuild shared release

gigi81 avatar Jan 04 '13 11:01 gigi81

Yes, please, add a sample where you experience that problem. Try to make it as minimal as possible.

txdv avatar Jan 04 '13 14:01 txdv

I've sent to your email addresses a small console app which reproduces the behaviour. Also, in lib folder you will find my uv.dll file compiled for 64bit. If something is wrong with it, please replace it with one that you know it works well and try to repeat the test.

@gigi81: I'll wait for the VS2012 patch in libuv for recompilation.

@txdv: As a side note - for my understanding: what is the recommended way to run the loop without blocking the main application thread?

Many thanks for looking into it.

robertmircea avatar Jan 04 '13 14:01 robertmircea

The best thing would have been to make a gist. I tested it on mono/ubuntu11.04 and it works fine. I currently can't get ahold of my windows computer, I'll be able to do so in a week.

What windows version are you running? Win8 32bit or 64bit?

@txdv: As a side note - for my understanding: what is the recommended way to run the loop without blocking the main application thread?

What thread are you talking about?

  1. Windows.Forms UI Thread?
  2. WPF UI Thread?
  3. Game engine thread?

This really depends much more on the thread you are trying to embed into. LibuvSharp has RunAsync, a function which dispatches all events but doesn't block at all. I used it in a server side game engine to dispatch all events whenever a frame is calculated.

txdv avatar Jan 04 '13 15:01 txdv

Here is the patch to suppport vs2012: https://github.com/joyent/libuv/pull/677

Now you can run: vcbuild release shared

gigi81 avatar Jan 05 '13 11:01 gigi81

So LibuvSharp is 3 times slower?

You should use a custom byte buffer allocator, currently it copies every piece of received data into a separate byte array.

http://www.mono-project.com/Profiler this is how to use the profiler on mono, I will sit down and benchmark it myself now too.

  • ntohs should be one of the hotspots i'm going to fix today. (for udp only)
  • the request could be a problem, for every write there is a new request created, released and garbage collected (this is definitely a problem which should add some significant speed performance)

I just wasn't in need for highspeed performance yet, because I used the library for managing sources which had 1-5 events every second and I didn't had too many sources of events.

(darn it, I deleted your message by accident, I swear to god, I pressed NO when it asked if I really want to delete that message, damn github)

txdv avatar Jan 05 '13 11:01 txdv

Ok, the ByteBufferAllocator is a big issue in this one.

The DefaultByteBufferAllocator(I renamed it to CopyingByteBufferAllocator in master) will copy all data from the buffer to a perfectly fitting byte array for every request. This builds up pressure on the GC and is slow (Buffer.Copy every read). The problem is that if you don't do it and if you receive a data block in the Data event and pass it directly into another async method (like write), the data can be overwritten by the time it sends it. So this default method is idiot proof, this is why I am going with it as the default.

I have tested it with another bytebufferallocator which doesn't copy the data to a perfectly fitting byte array and the result is a speed increase (it hits the roof of this machine) in handled reads per second (http://pastebin.com/wa13LSjA). Just remember, that you need to serialize/deal somehow that byte array segment within the Data event block without calling an async methods (like UVStream::Write, UVTimer.Tick, etc...).

This bytebufferallocator allows the user to adjust byte buffer creation to the protocol sitting on top of the tcp protocol.

I guess another source of speed increase will be to somehow manage the Request objects which are created for every write (but this is only for writing).

And please add some code. If you just say something like "it is slower, it doesn't work, etc...", maybe you mean well, but we can't do much with it.

txdv avatar Jan 05 '13 11:01 txdv

I've changed my code to use your StaticByteBufferAllocator. While the performance improved (I am now getting around 17500 msg/s instead of 14000), the CPU utilization is still at 100%. I will try to prepare a sample for performance benchmarks (IOCP windows via APM vs libuv). Unfortunately I cannot share my project and the part which does the message processing is quite intricate (it implements a SMS server using a specific binary protocol) and cannot be easily extracted from the project. I am using this project for benchmarking because I can quite easily compare the performance of the same real life code both on Windows and Linux either using libuv or BCL. Also, the client which I am using to do the load testing is an external verified application which runs on a different linux box than server.

Running the profiler on my app on Mono/Linux has given me some pretty interesting results: Most of the time is spent in unmanaged hits: 72%. Most consuming methods are __write_nocancel and __read_nocancel consuming almost 36% of the time in libpthread. The others don't have so much significance alone. I am not using explicit threads in other parts of my app.

I am sorry: I know that these stats without the actual code don't mean anything for you (as I said, I will try to prepare an eloquent example), but it might give you some hints referring to libuv.

Statistical samples summary Sample type: cycles Unmanaged hits: 75545 (72.2%) Managed hits: 29153 (27.8%) Unresolved hits: 4456 ( 4.3%) Hits % Method name 19784 18.90 __write_nocancel in /lib64/libpthread.so.0 17615 16.82 __read_nocancel in /lib64/libpthread.so.0 2125 2.03 __memset_sse2 in /lib64/libc.so.6 1325 1.27 (wrapper alloc) object:AllocSmall (intptr) 1268 1.21 __pthread_mutex_lock in /lib64/libpthread.so.0 1043 1.00 System.Threading.Timer/TimerComparer:Compare (object,object) 975 0.93 (wrapper alloc) object:AllocVector (intptr,intptr) 875 0.84 pthread_mutex_unlock in /lib64/libpthread.so.0 788 0.75 779 0.74 System.IO.MemoryStream:WriteByte (byte) 663 0.63 mono_lock_free_alloc in /usr/bin/mono-sgen 581 0.55 memcpy in /lib64/libc.so.6 576 0.55 encode_uleb128 in /usr/lib/libmono-profiler-log.so 560 0.53 mono_gc_register_for_finalization in /usr/bin/mono-sgen 557 0.53 System.Threading.Timer/Scheduler:FindByDueTime (long) 535 0.51 Bert.Gateway.Server.LibUV.LibuvTcpClient1:ParseReceivedBytesAndProcessDecodedPdu (byte[],int,int,bool) 515 0.49 System.Text.StringBuilder:Append (char) 507 0.48 __epoll_wait_nocancel in /lib64/libc.so.6 500 0.48 uv__write in /usr/lib/libuv.so 490 0.47 _int_free in /lib64/libc.so.6 476 0.45 ves_icall_System_Buffer_BlockCopyInternal in /usr/bin/mono-sgen 472 0.45 mono_object_hash in /usr/bin/mono-sgen 469 0.45 _int_malloc in /lib64/libc.so.6 461 0.44 sgen_hash_table_remove in /usr/bin/mono-sgen 444 0.42 alloc_handle in /usr/bin/mono-sgen 426 0.41 mono_gc_try_alloc_obj_nolock in /usr/bin/mono-sgen 423 0.40 mono_gc_memmove in /usr/bin/mono-sgen 421 0.40 uv_write2 in /usr/lib/libuv.so `

robertmircea avatar Jan 06 '13 17:01 robertmircea

How many cpu cores have you got? How many does the IOCP implementation utilize, how many does the libuv implementation utilize on mono/unix?

txdv avatar Jan 07 '13 12:01 txdv

Robert, are you sure you are not using some mutex? Like the .net "lock" or some Synchronized data structure. The good about libuv is that you can avoid all that syncronization stuff because you work on a single thread.

gigi81 avatar Jan 07 '13 17:01 gigi81

@gigi81

  1. Yes, I am sure. It is the very same code that runs on Windows using BCL's SendAsync/ReceiveAsync with IOCP. I have abstracted away the socket interfaces so that I can have the message processing logic independent of TCP transport used. The message framing and packet processing is called sequentially without lock for a single TCP connection because I don't start a subsequent socket read until I finished sending a reply for the current packets. This is true for both BCL sockets and libuv. For example, the code for ParseReceivedBytesAndProcessDecodedPdu referenced in the above mono profiler stats decodes and processes packets like this:
clientSocket.Data += socket_Data;


private void socket_Data(ArraySegment<byte> data)
        {
            ParseReceivedBytesAndProcessDecodedPdu(data.Array, data.Offset, data.Count);
        }


private void ParseReceivedBytesAndProcessDecodedPdu(byte[] buffer, int offset, int size)
        {
[....]
                foreach (var pdu in protocolFramer.FrameData(buffer, offset, size))
                        AppServer.ExecuteCommand(AppSession, pdu);
[....]
            }

FrameData is yielding on each packet framed from tcp stream.

This type of processing on socket read (be it BCL's or libuv) is far more CPU friendly due to avoidance of context switches and locking than putting message framing and command processing to run using the appdomain's threadpool (I've done tests in both cases).

  1. I've managed to obtain libuv.dll file (for 32bit and 64bit) using your patch for vs2012, but unfortunately LibUvSharp is using a specific commit (36b1e1a) from libuv codebase.

When I try to run LibuvSharp.Tests using the dll produced by your build I receive: System.EntryPointNotFoundException : Unable to find an entry point named 'uv_run_once' in DLL 'uv'.

It seems that function uv_run_once does no longer exist in libuv repo.

@txdv The machine has 2 cores (4 logical processors). IOCP is using all 4 logical processors evenly - at least this is what Process Explorer is reporting while running the test.

Can you simulate a load test on a more powerful machine as well some time in the future?

robertmircea avatar Jan 07 '13 21:01 robertmircea

It looks like you are doing a lot of writes. Try to sum up these into bigger writes. Furthermore the event loop on linux works only on one core. Making it fully load the machine is harder.

txdv avatar Jan 12 '13 12:01 txdv