websocket client not specifying a (sub)protocol fails to communicate with LWS server
In minimal-examples-lowlevel/ws-server/minimal-ws-server/, commenting out the lws_callback_http_dummy line (such that the server runs only one protocol) and connecting with the attached python script, the connection times-out, unless lws-minimal is passed as an argument.
I'm running this on Windows, on a commit out of main that is like 2 weeks old.
This behavior seems to be contrary to the documentation that reads:
Websockets allows connections to negotiate without a protocol name... in that case by default it will bind to the first protocol in your vhost protocols[] array.
er, here's the script:
import websockets
import asyncio
async def hello(prot=None):
uri = "ws://localhost:7681"
async with websockets.connect(uri, subprotocols=prot) as ws:
print('connected')
await asyncio.sleep(5)
if __name__ == "__main__":
import sys
prot = [sys.argv[1]] if len(sys.argv) > 1 else None
asyncio.run(hello(prot))
Using no sub-protocol is not a very good way... you cannot know what protocol an unknown server is going to try to communicate with, and whatever it uses, it cannot stage any upgrade to it, eg, continue using v1 for v1 clients and v2 for clients that know what it is automatically (ws supports it by asking for v2,v1 in the initial upgrade). Any two unknown servers will map their own magic protocol to "no protocol name" incompatibly, so you have zero idea what it will speak. And you can only offer exactly one protocol on one vhost like that.
In contrast there's at least a reasonable hope that if the server negotiates a specific name like "com.warmcat.myprotocol.2025-04-26", even on an unknown server, it is going to compatibly talk to the client who has an understanding of that protocol. And if you use explicit names, you don't have to deal with selecting a default protocol as below.
It's still the case that the first protocol will be the "default" for serving, but over the years there are other protocols (eg, the built-in http default one) that might get in front of yours. So there is another way to define it, tag your chosen protocol with a pvo called "default". The documentation is not wrong, "by default" it does what it does but in most cases the first protocol that is actually defined on the vhost is not what you want any more.
pvo (per-vhost options) allow you to tag a protocol on a vhost with a named object map, that is, if you serve the same protocol on multiple vhosts, you can apply a different set of pvos to each vhost-protocol combination and so offer the protocol in different ways, eg, with different config.
The toplevel pvo is used to name the protocol that the child pvo applies to, the children are the tags. So if your desired protocol is named "myprotocol" on the server side, you can tag it to be used as the "no protocol name" choice by pvos like this at your server
static struct lws_protocol_vhost_options pvo1 = { NULL, NULL, "default", "" }, pvo = { NULL, &pvo1, "myprotocol", "" };
...
info.pvo = &pvo;
You should see an info log at vhost creation time "Setting default protocol to myprotocol" then.
Thanks for the good info.
In my code, I am now seeing the INFO line lws_protocol_init_vhost: Setting default protocol to defaultProtocolName, however, I'm still seeing the exact same behavior; when I connect using subprotocols=["defaultProtocolName"] then the connection succeeds, and when subprotocols=None I get:
[2025/04/28 17:13:59:1317] N: lws_create_context: LWS: 4.3.99-R4.3.2-6-g2e0cf94, NET CLI SRV H1 H2 WS SS-JSON-POL ConMon IPv6-absent
[2025/04/28 17:13:59:1377] N: __lws_lc_tag: ++ [wsi|0|pipe] (1)
[2025/04/28 17:13:59:1417] N: __lws_lc_tag: ++ [vh|0|default||-1] (1)
[2025/04/28 17:13:59:1427] N: __lws_lc_tag: ++ [wsicli|0|WS/h1/default/192.168.125.205] (1)
[2025/04/28 17:13:59:1427] W: lws_plat_set_socket_options_ip: priority and ip sockets options not implemented on windows platform
[2025/04/28 17:13:59:1427] N: lws_plat_set_socket_options_ip: set use exclusive addresses
[2025/04/28 17:13:59:1427] N: [wsicli|0|WS/h1/default/192.168.125.205]: lws_client_connect_3_connect: trying 192.168.125.205
[2025/04/28 17:14:02:1571] N: __lws_lc_untag: -- [wsi|0|pipe] (0) 3.019s
[2025/04/28 17:14:02:1571] N: __lws_lc_untag: -- [vh|0|default||-1] (0) 3.015s
[2025/04/28 17:14:02:1571] N: __lws_lc_untag: -- [wsicli|0|WS/h1/default/192.168.125.205] (0) 3.014s
[2025/04/28 17:14:24:9521] I: lws_create_context: Event loop: poll
[2025/04/28 17:14:24:9531] I: lws_create_context: ctx: 6704B (2608 ctx + pt(1 thr x 4096)), pt-fds: 30000
[2025/04/28 17:14:24:9531] I: lws_create_context: http: ah_data: 4096, ah: 992, max count 30000
[2025/04/28 17:14:24:9581] I: lws_plat_pipe_create: cancel UDP skt port 61372
[2025/04/28 17:14:24:9581] I: lws_server_get_canonical_hostname: canonical_hostname = SGDEVTEMPLATE
[2025/04/28 17:14:24:9591] I: [vh|0|default||7893]: lws_create_vhost: Creating Vhost 'default' port 7893, 5 protocols, IPv6 off
[2025/04/28 17:14:24:9591] I: _lws_vhost_init_server_af: af 2
[2025/04/28 17:14:24:9591] I: Listening on (null):7893
[2025/04/28 17:14:24:9591] I: lws_create_context: mem: per-conn: 1272 bytes + protocol rx buf
[2025/04/28 17:14:24:9817] I: lws_create_context: Event loop: poll
[2025/04/28 17:14:24:9827] I: lws_create_context: ctx: 6704B (2608 ctx + pt(1 thr x 4096)), pt-fds: 30000
[2025/04/28 17:14:24:9827] I: lws_create_context: http: ah_data: 4096, ah: 992, max count 30000
[2025/04/28 17:14:24:9887] I: lws_plat_pipe_create: cancel UDP skt port 61374
[2025/04/28 17:14:24:9887] I: lws_server_get_canonical_hostname: canonical_hostname = SGDEVTEMPLATE
[2025/04/28 17:14:24:9887] I: [vh|0|default||9002]: lws_create_vhost: Creating Vhost 'default' port 9002, 5 protocols, IPv6 off
[2025/04/28 17:14:24:9887] I: _lws_vhost_init_server_af: af 2
[2025/04/28 17:14:24:9896] I: Listening on (null):9002
[2025/04/28 17:14:24:9896] I: lws_create_context: mem: per-conn: 1272 bytes + protocol rx buf
[2025/04/28 17:14:25:0208] I: lws_state_notify_protocol_init: doing protocol init on POLICY_VALID
[2025/04/28 17:14:25:0208] I: lws_protocol_init:
[2025/04/28 17:14:25:0208] I: [vh|0|default||9002]: lws_protocol_init_vhost: Setting default protocol to defaultProtocolName
[2025/04/28 17:14:25:0208] I: [vh|0|default||9002]: lws_protocol_init_vhost: init default.defaultProtocolName
[2025/04/28 17:14:25:0218] I: lws_state_notify_protocol_init: doing protocol init on POLICY_VALID
[2025/04/28 17:14:25:0218] I: lws_protocol_init:
[2025/04/28 17:14:25:0218] I: [vh|0|default||7893]: lws_protocol_init_vhost: Setting default protocol to defaultProtocolName
[2025/04/28 17:14:25:0218] I: [vh|0|default||7893]: lws_protocol_init_vhost: init default.defaultProtocolName
[2025/04/28 17:14:25:0254] I: lws_state_transition_steps: CONTEXT_CREATED -> OPERATIONAL
[2025/04/28 17:14:25:0264] I: lws_state_transition_steps: CONTEXT_CREATED -> OPERATIONAL
[2025/04/28 17:14:27:3925] I: lws_header_table_attach: [wsisrv|0|default|(null)]: ah 0000000000000000 (tsi 0, count = 0) in
[2025/04/28 17:14:27:3925] I: _lws_create_ah: created ah 00000188851B88F0 (size 4096): pool length 1
[2025/04/28 17:14:27:3925] I: lws_header_table_attach: did attach wsi [wsisrv|0|default|(null)]: ah 00000188851B88F0: count 1 (on exit)
[2025/04/28 17:14:27:3925] I: [wsisrv|0|default|(null)]: lws_adopt_descriptor_vhost2: vhost [vh|0|default||9002]
[2025/04/28 17:14:27:3943] I: lws_handshake_server: parsed count 482
[2025/04/28 17:14:27:3948] I: lws_select_vhost: vhost match to default based on port 9002
[2025/04/28 17:14:27:3959] I: Upgrade to ws
[2025/04/28 17:14:27:3959] I: lws_process_ws_upgrade: defaulting to prot handler 0
[2025/04/28 17:14:27:4000] I: [wsisrv|0|default|(null)]: lws_issue_raw: ssl_capable_write (127) says 127
[2025/04/28 17:14:27:4048] I: [wsisrv|0|default|(null)]: _lws_validity_confirmed_role: setting validity timer 300s (hup 0)
[2025/04/28 17:14:27:4048] I: lws_process_ws_upgrade2: [wsisrv|0|default|(null)]: dropping ah on ws upgrade
[2025/04/28 17:14:27:4048] I: __lws_header_table_detach: [wsisrv|0|default|(null)]: ah 00000188851B88F0 (tsi=0, count = 1)
[2025/04/28 17:14:27:4048] I: __lws_header_table_detach: nobody usable waiting
[2025/04/28 17:14:27:4048] I: _lws_destroy_ah: freed ah 00000188851B88F0 : pool length 0
[2025/04/28 17:14:27:4048] I: __lws_header_table_detach: [wsisrv|0|default|(null)]: ah 00000188851B88F0 (tsi=0, count = 0)
I am talking to the other team that initiates the connection to see whether they can make the mods to indicate a protocol name, but this should work, no?
but this should work, no?
I hate this kind of question... I have no idea what you have written. AFAIK the part in lws works fine as described, but what you have is a coproduction between lws, your user code (nothing to do with me) and your client code (nothing to do with me either). Eg, "subprotocols=None" means nothing to me, that is not anything to do with lws or C. You can talk usefully about what headers you are sending to lws, in the case there's no subprotocol, there should be no subprotocol header.
You can see your headers (assuming it is h1) by building lws with -DLWS_TLS_LOG_PLAINTEXT_RX=1, this will dump all received data after tls decryption. If you're not using tls, it's even easier: you can use wireshark or tcpdump to see what you are actually sending.
That's a fair criticism that there's a tremendous number of variables just as you have described, between my application and the LWS server. To reduce those variables, I appealed to the minimal-examples-lowlevel/ws-server/minimal-ws-server/ example, where I found identical behavior when connecting to it with (what I hoped would be) an extremely minimal, understandable, and reliable client implementation as is that short python snippet I included. However, I now see that I assumed you to have some working understanding of python, which is fine.