uWebSockets icon indicating copy to clipboard operation
uWebSockets copied to clipboard

added more c apis and capi ssl version with capi examples

Open cirospaciari opened this issue 4 years ago • 93 comments

This adds a complete useful capi with ssl suport and most examples added.

It's a step to solve this issue: https://github.com/uNetworking/uWebSockets/issues/1192 Opaque C interface for Swift, Rust, etc

For now its just a wrapper and can be optimized a lot, but its a start.

make capi, and make capi_examples for build added

cirospaciari avatar Jan 05 '22 00:01 cirospaciari

instead of uws_app_listen(SSL, style i wrote in uws_ssl_... style because responses and websockets has diferent types while in App and SSLApp, with a constant with -O3 optimization and a static library i think the code will be generated ok and inlined, but i want to use this capi in Ruby with FFI and i dont want to have a if to check pointer conversions at runtime>

In dynamic typed languages as Ruby, Python and Lua, UWS::App and UWS::SSLApp will be easy to implement and switch between App and SSLApp and static typed languages with generics is easy too to create variants (like in rust)

In pure C with different structs when matters, keep me/developers safe to make mistakes like passing a ssl version of response in uws_res_end instead of uws_ssl_res_end without overhead, if someone wants to use the capi in C instead of integrate with a dynamic library in other language, and don't want to write different codes to switch between App and SSLApp, a header with macros would solve this problem in a safe way.

I tested with autocannon -c 100 -d 40 -p 10 http://localhost:3000/ and the CAPIHelloWorld and HelloWorld produces the same/similar performance numbers, Ruby integrates very well currently i'm testing with FFI and i will integrate with Python and Rust with FFI too, source code available in my github (just for linux for now, will be cross platform solution later)

cirospaciari avatar Jan 06 '22 14:01 cirospaciari

I like the energy but there are a few problems -

I've fixed the existing experimental CAPI and it now works with identical performance as the C++ one so the idea is very much valid.

Problems:

  • You add copies and dynamic memory allocations everywhere via that toNullTerminated - that's a big no no
  • SSL should be an argument, we don't want the whole API duplicated for every protocol, esp. not when QUIC is yet another protocol
  • Benchmarking with autocannon is 10000000% invalid. Use uSockets/http_load_test. Performance MUST BE IDENTICAL as C++, there can be NO DIFFERENCE AT ALL, or the whole thing must be thrown in the garbage.
  • Dynamic linking with .so is a big no no, must be a static library built with LTO optimization otherwise performance goes to shit.
  • You don't need to make tons of examples, the important is getting the library itself done.
  • I like the name libuwebsockets.so, but it should be libuwebsockets.a

It is extremely important that perf. remains identical to the original (this is the case with existing experiment).

ghost avatar Jan 07 '22 02:01 ghost

Yea the toNullTerminated was a mistake for sure but is easy replaceable. I created the examples while developing and using the C++ examples as a reference, so I include them.

About the performance testing i will look on http_load_test, i used autocannon and wrk for http because it's the tools I knew, if you can tell me the reasons why using autocannon would be invalid I would be grateful (so I don't make that mistake again).

I will take a look in the last commit you made in the current experiment, and will change the SSL version to be a parameter.

I will remove the toNulllTermined and improve to get the same performance in the C wrapper.

CAPI (my version) ./http_load_test 40 localhost 3000 Running benchmark now... Req/sec: 299293.000000 Req/sec: 285187.250000 Req/sec: 298784.750000 Req/sec: 297208.500000 Req/sec: 299072.500000 Req/sec: 291401.000000 ^C C++ original version: ./http_load_test 40 localhost 3000 Running benchmark now... Req/sec: 328924.500000 Req/sec: 330413.000000 Req/sec: 332356.750000 Req/sec: 332048.000000 Req/sec: 331578.500000

Its clear the performance hit when using http_load_test, i will stick with http_load_test for now and the future.

cirospaciari avatar Jan 07 '22 16:01 cirospaciari

I want to test with the SSL as argument in my experiment to check one extra time it remains the same perf.

Also another thing:

  • You don't need to wrap libusockets - that library is already C, so just leave it be. You see that libusockets use SSL as first argument so libuwebsockets should as well.

ghost avatar Jan 07 '22 17:01 ghost

I also want to make a very basic minimal test of a Rust wrapper with a nice interface much like the one in C++ to see if that is even viable - if so then it could make sense to create a Rust crate that just works out of the box. And maybe even a Swift one - then the reach of the project can expand without being limited by scripting performance loss.

ghost avatar Jan 07 '22 17:01 ghost

I want to test with the SSL as argument in my experiment to check one extra time it remains the same perf.

Also another thing:

  • You don't need to wrap libusockets - that library is already C, so just leave it be. You see that libusockets use SSL as first argument so libuwebsockets should as well.

I tested with the SSL parameter and the performance numbers are the same, after and before. I tested in rust too and i have some little variation but i think its margin of error because the difference its not consistent a more complex sample testing maybe highlight something, but for now seems good.


#define SSL 1

void get_handler(uws_res_t *res, uws_req_t *req) {
    uws_res_end(SSL, res, "Hello CAPI!", 11);
}

void listen_handler(void *listen_socket) {
    if (listen_socket) {
        printf("Listening on port now\n");
    }
}

int main() {
    uws_app_t *app = uws_create_app(SSL, (struct us_socket_context_options_t){
        /* There are example certificates in uWebSockets.js repo */
	    .key_file_name = "../misc/key.pem",
	    .cert_file_name = "../misc/cert.pem",
	    .passphrase = "1234"
    });
    uws_app_get(SSL, app, "/*", get_handler);
    uws_app_listen(SSL, app, 3000, listen_handler);
    uws_app_run(SSL, app);
}

cirospaciari avatar Jan 07 '22 19:01 cirospaciari

Yeah as long as it is a static library compiled with LTO it should constant fold that if statement

ghost avatar Jan 07 '22 19:01 ghost

I added length in all points that i needed pass a string_view and it bring the performance up tothe same as the C++ version.

I also add some changes in example.rs now examples/RustHelloWorld.rs, now i able to compile on my machine without warnings and errors.

I removed all copies and dynamic memory allocations, but i'm trying to get the reader for do a upgrade (i want to make it work in C and add to Rust for testing) in my last commit UpgradeSync/UpgradeAsync

The header will never returns because its a std::string_view, i'm trying something like this:

const char *uws_req_get_header(uws_req_t *res, const char *lower_case_header, size_t lower_case_header_length)
{
      uWS::HttpRequest *uwsReq = (uWS::HttpRequest *)res;
      std::string_view header = uwsReq->getHeader(std::string_view(lower_case_header, lower_case_header_length));
      return std::string(header).c_str();
}

and the call in UpgradeSync.c:

const char* header= uws_req_get_header(request, "sec-websocket-extensions", 24);

The string_view will get out of scope and free, how can i accomplish this or with strategy i should use to get the headers without dynamic alloc or copy? i could pass a buffer and a buffer size and copy the info back, but i want help to know if exists a better alternative.

After this i will add the SSL params in all points.

cirospaciari avatar Jan 07 '22 23:01 cirospaciari

I'm playing with something like this:

fn main() {
    App::new().get(|res, req| {
        //res.end("Hello world!");
    }).listen(|listenSocket| {
        if listenSocket != 0.0 {
            println!("Listening now!");
        }
    }).run();
}

ghost avatar Jan 08 '22 00:01 ghost

I'm playing with something like this:

fn main() {
    App::new().get(|res, req| {
        //res.end("Hello world!");
    }).listen(|listenSocket| {
        if listenSocket != 0.0 {
            println!("Listening now!");
        }
    }).run();
}

Cool very close to my ruby version

UWS::App.new()
.get("/", lambda {|response, request| response.end("Hello World uWS from Ruby!")})
.listen(8082, lambda {|socket, config| puts "Listening on port #{config.port}" })
.run()

cirospaciari avatar Jan 08 '22 00:01 cirospaciari

Performance impacts before are from char* to string_view convertions without length and the use of dynamic library it self (about of 13% of performance impact), now perfoms the same

CAPI

cirospaciari@gl702vsk:~/Desktop/teste/uWebSockets/uSockets$  ./http_load_test 40 localhost 3000
Running benchmark now...
Req/sec: 318886.250000
Req/sec: 321785.750000
Req/sec: 320193.250000
Req/sec: 313785.750000
^C

C++

cirospaciari@gl702vsk:~/Desktop/teste/uWebSockets/uSockets$ ./http_load_test 40 localhost 3000
Running benchmark now...
Req/sec: 313398.750000
Req/sec: 312036.750000
Req/sec: 317940.750000
Req/sec: 320571.250000
^C

I will add the SSL parameters now and add a more complex sample using Uprade and Broadcast with Rust and i will collect some performance data and compare Rust, CAPI and C++. I think if we static link we should have zero or almost zero performance impact with a more OOP like Rust wrapper, i will do a stress test and measuring if in some place the compiler will not fold the conditional (i belive if we create a crate with the wrapper maybe the compiler will not optimize but lets do some testing)

cirospaciari avatar Jan 08 '22 11:01 cirospaciari

Without heavy modifications i could archive the code bellow without performance impact:

extern "C" fn listen_handler(
    _listen_socket: *mut ::std::os::raw::c_void,
    config: UwsAppListenConfigT,
) {
    println!("Listening on port {}", config.port);
}

extern "C" fn get_handler(res: *mut UwsResT, _: *mut UwsReqT) {
    unsafe {
        let message = ::std::ffi::CString::new("Hello Rust!").expect("");
        uws_res_end(res, message.as_ptr(), 11, false);
    }
}


fn main() {
    App::new()
        .get("/", get_handler)
        .listen(3000, listen_handler)
        .run();
}

For archive what you expected in rust passing a user_data: void* is needed:

fn main() {
    App::new()
        .get("/", |res, _req| {
            res.end("Hello Rust!");
        })
        .listen(3000, |_listen_socket, config| {
            println!("Listening on port http://127.0.0.1:{}", config.port);
        })
        .run();
}

No visible performance loss:

CAPI

cirospaciari@gl702vsk:~/Desktop/teste/uWebSockets/uSockets$ ./http_load_test 40 localhost 3000
Running benchmark now...
Req/sec: 311942.000000
Req/sec: 319498.250000
Req/sec: 323747.000000
^C

Rust

cirospaciari@gl702vsk:~/Desktop/teste/uWebSockets/uSockets$ ./http_load_test 40 localhost 3000
Running benchmark now...
Req/sec: 319163.000000
Req/sec: 323562.500000
Req/sec: 323056.250000
^C

cirospaciari avatar Jan 08 '22 13:01 cirospaciari

One thing missing is that all functions that take a function must also take a void pointer so closures can be set. That's a little tricky, I need to look at Rust pointers. If that works without perf loss then everything should be set

ghost avatar Jan 08 '22 13:01 ghost

One thing missing is that all functions that take a function must also take a void pointer so closures can be set. That's a little tricky, I need to look at Rust pointers. If that works without perf loss then everything should be set

I was typing just that haha, yeah is tricky but is useful in other languages too, i used this in "uws_create_timer" as a helper and in uws_res_on_aborted, uws_res_on_data and uws_res_on_writable

I will keep my branch updated

Edit: Added the void pointers, was i said before i cannot measure any performance loss with http_load_test, after a little break i will start working in the SSL parameter

the following is working in capi/examples/RustHelloWorld.rs make default will create HelloWorld (capi) and RustHelloWorld make rust will only create RustHelloWorld

fn main() {
    App::new()
        .get("/", |res, _req| {
            res.end("Hello Rust!");
        })
        .listen(3000, |_listen_socket, config| {
            println!("Listening on port http://127.0.0.1:{}", config.port);
        })
        .run();
}

cirospaciari avatar Jan 08 '22 13:01 cirospaciari

Holy shit did you already get that Rust example working? Nice!

ghost avatar Jan 09 '22 18:01 ghost

Holy shit did you already get that Rust example working? Nice!

Thanks haha I will put the SSL parameter today and more things in Rust for testing

cirospaciari avatar Jan 09 '22 18:01 cirospaciari

A bit more feedback -

As long as the basic proof-of-concept Rust (or Swift or whatever) example works with the same perf. you can just leave that part undone.

There are still places where memory allocations and free are called - the C wrapper should not allocate memory anywhere.

There are also still places where libusockets functionality is wrapped - don't wrap what is already in C.

ghost avatar Jan 09 '22 19:01 ghost

Returning a string view in C would probably look like so

void f(char **, size_t *)

Or possibly

size_t f(char **)

ghost avatar Jan 09 '22 19:01 ghost

A bit more feedback -

As long as the basic proof-of-concept Rust (or Swift or whatever) example works with the same perf. you can just leave that part undone.

There are still places where memory allocations and free are called - the C wrapper should not allocate memory anywhere.

There are also still places where libusockets functionality is wrapped - don't wrap what is already in C.

Thanks for the feedback, I already remove all malloc and free only place left is uws_timer_close and uws_create_timer helpers (that uses memcpy and free), i will move this part to an helper.c in examples, will not be part of the final libuwebsockets for sure, i need to refactor some things too but this is easy, for last testing i want o test the http vs https performance in C++ and Rust and CAPI and test the WebSocket performance.

for returning the string_view i used buffer strategy, so the caller of the library could have full control of the lifetime of the data (thinking in the async responses) any other strategy of i could think (and worked) will just get the memory freed before the upgrade/ async response and let me with garbage.

Example:

int uws_req_get_header(uws_req_t * res, const char *lower_case_header, size_t lower_case_header_length, char *dest_buffer, size_t dest_buffer_length)
{
    uWS::HttpRequest *uwsReq = (uWS::HttpRequest *)res;
    std::string_view value = uwsReq->getHeader(std::string_view(lower_case_header, lower_case_header_length));
    size_t length = value.length();
    //return the length if dest is too small or length is zero
    if (!length || length >= dest_buffer_length || dest_buffer_length <= 0)
        return length;
    std::strncat(dest_buffer, value.data(), length);
    return length; //return the length for reference and check
}

In this sample i get a buffer and the size of the buffer, if the data dont fIts i just return the length so the caller could realloc and call again, if the data fits copy and return the length of the data, upgrades also uses string_view so the length is necessary for performance, if you could provide me some better solution i will be grateful and will refactor asap.

In the end i want to let something close to usable (even its only a concept), i dont want to let somethings half broken, i know we probably will modify/remove/chance a lot of thinks yet, but i love to get all this done and i a fast coding guy lol

cirospaciari avatar Jan 09 '22 20:01 cirospaciari

The wrapper shouldn't change anything about the underlying library. So all strings returned are short lived zero copies. If the user expects the strings to survive longer they have to copy to a dynamic block of memory themselves but that's not part of the wrapper

ghost avatar Jan 09 '22 20:01 ghost

The wrapper shouldn't change anything about the underlying library. So all strings returned are short lived zero copies. If the user expects the strings to survive longer they have to copy to a dynamic block of memory themselves but that's not part of the wrapper

Agreed, but i cannot think in some other way to after the functions leaves i get something other than gargabe, because the string_view will free immediately after the functions ends, i need to research more about string_view so i will ask you to tell me the better way, because its one limitation for me now, sorry for that

cirospaciari avatar Jan 09 '22 20:01 cirospaciari

A string_view is just a pair of (char * and size_t), so it's a non-owning view of some memory owned by the library. In this particular library all views are valid for the whole duration of the callback they are called from within.

So headers are valid inside the duration of the get handler.

ghost avatar Jan 09 '22 20:01 ghost

The C++ library really should use a noncopyable, nonmovable variant of string_view because then it is not possible to make lifetime mistakes

ghost avatar Jan 09 '22 20:01 ghost

A string_view is just a pair of (char * and size_t), so it's a non-owning view of some memory owned by the library. In this particular library all views are valid for the whole duration of the callback they are called from within.

So headers are valid inside the duration of the get handler.

basically, is this the right way?

    int uws_req_get_header_test(uws_req_t *res, const char *lower_case_header, size_t lower_case_header_length,const char**dest)
    {
        uWS::HttpRequest *uwsReq = (uWS::HttpRequest *)res;

        std::string_view value = uwsReq->getHeader(std::string_view(lower_case_header, lower_case_header_length));
        *dest = value.data();
        return value.length(); 
    }
    const char *ws_key = NULL;
    const char *ws_protocol = NULL;
    const char *ws_extensions = NULL;
    int ws_key_length = uws_req_get_header_test(request, "sec-websocket-key", 17, &ws_key);
    int ws_protocol_length = uws_req_get_header_test(request, "sec-websocket-protocol", 22, &ws_protocol);
    int ws_extensions_length = uws_req_get_header_test(request, "sec-websocket-extensions", 24, &ws_extensions);

    printf("ws_key = %.*s\n", ws_key_length, ws_key);
    printf("ws_protocol = %.*s\n", ws_protocol_length, ws_protocol);
    printf("ws_extensions = %.*s\n", ws_extensions_length, ws_extensions);
Listening on port ws://localhost:9001
ws_key = vqXMUL/+KaIkdS2iEjO3BA==
ws_protocol = 
ws_extensions = permessage-deflate; client_max_window_bits
Something is: 15

Obs: ignore the integer return it must be size_t i know

Edit: my error before this was trying to use std::string(value).c_str() instead of value.data() so i got garbage

cirospaciari avatar Jan 09 '22 20:01 cirospaciari

Yep, now it is just a minimal wrapper.

ghost avatar Jan 09 '22 21:01 ghost

Yep, now it is just a minimal wrapper.

I will commit my changes soo, thanks!

Edit: just commited, i will commit a SSL version soon

cirospaciari avatar Jan 09 '22 21:01 cirospaciari

I just added and commited the SSL parameters

Rust working sample in capi/examples/RustHelloWorld.rs

pub type App = TemplateApp<0>;
pub type SSLApp = TemplateApp<1>;

fn main() {
    let config = UsSocketContextOptions {
        key_file_name: "../misc/key.pem",
        cert_file_name: "../misc/cert.pem",
        passphrase: "1234",
        ca_file_name: "",
        dh_params_file_name: "",
        ssl_prefer_low_memory_usage: 0,
    };

    SSLApp::new(config)
        .get("/", |res, _req| {
            res.end("Hello Rust!");
        })
        .listen(3000, |_listen_socket, config| {
            println!("Listening on port http://127.0.0.1:{}", config.port);
        })
        .run();
}

I used ./http_load_test 40 localhost 3000 for testing, this are the results:

Rust

Running benchmark now...
Req/sec: 313409.750000
Req/sec: 325492.250000
Req/sec: 319489.750000
Req/sec: 332671.250000
^C

C++

Running benchmark now...
Req/sec: 327909.750000
Req/sec: 317721.500000
Req/sec: 319694.000000
Req/sec: 332376.000000
^C

Maybe now we have a start point, please tell me when you create the crate repository for the uWebSockets.rs, i will love to help you on this too.

I will wait your approval and direction for more commits thanks

I will make some tests with WS and upgrade + Rust

cirospaciari avatar Jan 09 '22 22:01 cirospaciari

I don't really think Rust is worth it - I know for a fact (well, gut feeling really) that Rust people will NEVER touch a library written in C++ so it's a dead end. That was more of a proof-of-concept. Rust people are driven by ideology, and they hate C++ with a passion.

Also, there are already very competent alternatives in Rust already, so it's really pointless. You can get pretty much the same perf. as uWS in Rust already.

What I think is worth it, is PyPy. Python is extremely popular (https://www.tiobe.com/tiobe-index/) and

I already wrote a CPython extension before and it was fast, but CPython is really slow so I discarded it. PyPy is fantastically compatible with CPython, and a lot faster. PyPy docs say that they much prefer CFFI than doing extensions, not even sure if PyPy even has extensions, so PyPy + CFFI with libuwebsockets.so (yeah has to be dynamic here) is probably worth it, esp. if client support is added at some point.

I did a test with dynamic linking, I get 93% perf. when linking to libuwebsockets.so dynamically. What kind of perf. do you get overall with your Ruby FFI? How do you plan on shipping that? libuwebsockets.so?

TLDR; finishing the c wrapper is step 1, and from that point it's all guessing.

ghost avatar Jan 10 '22 02:01 ghost

I don't really think Rust is worth it - I know for a fact (well, gut feeling really) that Rust people will NEVER touch a library written in C++ so it's a dead end. That was more of a proof-of-concept. Rust people are driven by ideology, and they hate C++ with a passion.

Also, there are already very competent alternatives in Rust already, so it's really pointless. You can get pretty much the same perf. as uWS in Rust already.

What I think is worth it, is PyPy. Python is extremely popular (https://www.tiobe.com/tiobe-index/) and

I already wrote a CPython extension before and it was fast, but CPython is really slow so I discarded it. PyPy is fantastically compatible with CPython, and a lot faster. PyPy docs say that they much prefer CFFI than doing extensions, not even sure if PyPy even has extensions, so PyPy + CFFI with libuwebsockets.so (yeah has to be dynamic here) is probably worth it, esp. if client support is added at some point.

I did a test with dynamic linking, I get 93% perf. when linking to libuwebsockets.so dynamically. What kind of perf. do you get overall with your Ruby FFI? How do you plan on shipping that? libuwebsockets.so?

TLDR; finishing the c wrapper is step 1, and from that point it's all guessing.

Ruby FFI was only a test, now I want to make a native C extension and ship using gems like agoo and falcon, I get better results than agoo using the techempower benchmarks but I was using a bug version without passing lengths on string_view, using the techempower (that uses wrk and a Lua script for pipelining) in plain text and json, I got 140k in json test and 100 to 470k in plain text (was inconsistent and depends on pipelining levels, with autocannon I got a lot more load and can push up to 200k in json and 1.47 million in plain text with 30 pipeling while in ago I got only 100k) but I can use a static version using extensions instead of FFI and use the http_load_test now for compare, the performance was about 87% with my bug version but in some cases 50%, I think I can get close to 100% performance but I think will be in the 80 to 90% range when staticaly linked, but still very worth for ruby, will be the fastest in the ruby market for sure 😃 Python and PyPy or Lua will be my next project after Ruby (ruby was my first language so is a passion project), I want to put something in Lua/Luajit too because I believe will be better then Pencil, in some performance tests I have wrote Luajit performances gets closer to C with clang than PyPy3 and I can use static linked library's in Lua/ luajit extensions and ship with LuaRocks, I think I can get to 100% performance with Luajit.

So my focus will shift for using this capi as a staticaly linked library in a C extension for Ruby and deliver this as Ruby gem, like agoo and falcon.

I will be improving this capi on my fork and pushing here after tested always, i will be grateful if you accept my pull request and I will download directly from here for building the extension and directly from my repo/other branch for a unstable version until the gem was done and tested.

cirospaciari avatar Jan 10 '22 08:01 cirospaciari

Again about CFFI and PyPy3 i will create a .so optimized for this guy using macros to fold the if(ssl), i have done one test here and its get around 5% more performance than using directly. PyPy in my tests gets 98% of the original performance (with macros) and about 92~93% without macros. I think write a minimal example in python will be needed as a case of study. I need to study how pip works but will be my next project after Ruby for sure.

I started creating the ext for ruby and i will do some basic perform tests compared with a updated FFI + dynamic library, but FFI have some problems with GIL and some times hurts performances as much as 50% so Ruby ext + Ruby Gem + git submodules its almost the right answer

cirospaciari avatar Jan 10 '22 14:01 cirospaciari