python-socketio icon indicating copy to clipboard operation
python-socketio copied to clipboard

multiple namespace, race condition in asyncio, no individual sid

Open simingy opened this issue 5 years ago • 9 comments

Problem

  1. sio.connect() first connects to namespace / before connecting to other namespaces. Subsequent emit() commands are allowed before all namespaces are connected to - causing messages to disappear (due to namespace not connected yet). Adding asyncio.sleep(1) before the first emit() seems to fix the problem,

  2. in the documentation https://python-socketio.readthedocs.io/en/latest/server.html#namespaces, 2nd paragraph suggests that

Each namespace is handled independently from the others, with separate session IDs (sids), ...

it seems in my snippet below the same sid is used for all namespaces

am I missing something?

Code

Client code:

  
import time
import asyncio
import socketio
import logging

logging.basicConfig(level='DEBUG')
loop = asyncio.get_event_loop()
sio = socketio.AsyncClient()

@sio.event
async def message(data):
    print(data)

@sio.event(namespace='/abc')
def message(data):
    print('/abc', data)

@sio.event
async def connect():
    print('connection established', time.time())

@sio.event(namespace='/abc')
async def connect():
    print("I'm connected to the /abc namespace!", time.time())

async def start_client():
    await sio.connect('http://localhost:8080', transports=['websocket'],
                      namespaces=['/', '/abc'])
    # await asyncio.sleep(2)
    await sio.emit('echo', '12345')
    await sio.emit('echo', '12345', namespace='/abc')
    await sio.wait()

if __name__ == '__main__':
    loop.run_until_complete(start_client())

server code:

import socketio
import time
from aiohttp import web

sio = socketio.AsyncServer(async_mode = 'aiohttp')

app = web.Application()
sio.attach(app)

redis = None

@sio.event
async def connect(sid, environ):
    print("connected", sid, time.time())

@sio.event(namespace='/abc')
async def connect(sid, environ):
    print("connected /abc", sid, time.time())

@sio.event(namespace='/abc')
async def echo(sid, msg):
    print('abc', sid, msg)
    await sio.emit('message', msg, to=sid, namespace='/abc')
    
@sio.event
async def echo(sid, msg):
    print(sid, msg)
    await sio.emit('message', msg, to=sid)

if __name__ == '__main__':
    web.run_app(app)

simingy avatar Apr 14 '20 02:04 simingy

The documentation is incorrect. All the namespaces for a client connection use the same sid. I'll fix that.

miguelgrinberg avatar Apr 14 '20 09:04 miguelgrinberg

@miguelgrinberg thanks for the quick response!

can you please also take a look at issue 1? it seems like namespaces outside of / are not getting connected immediately and there a message losses.

also, instead of changing configuration, isn't the separate sid intended?

in node.js reference implementation, connecting two namespaces

    var socket = io.connect('http://localhost:8080');
    var socket_ns = io('http://localhost:8080/my-namespace');

generates two sid on the node.js server

Connection:
d2Lc2BBnPMwgapxuAAAA
Connection to namesoace:
/my-namespace#d2Lc2BBnPMwgapxuAAAA

even though they are highly related

simingy avatar Apr 14 '20 14:04 simingy

First of all, in the JS case you are issuing two separate connections for your two namespaces, so that is why you get separate sids. On the Python client you are taking advantage of an option to connect multiple namespaces in the same call, so both are multiplexed in the same connection.

The time it takes for the connection to be established is highly variable, as the call needs to reach the server, and then the connect callback on the server is invoked and can take time to complete and accept the connection. Normally the client would wait to receive something from the server before it starts emitting.

It would be nice if the connect call on the client would block until the connection is fully established, but I don't see that being practical, as some people put long running functions on their server-side connect handlers (terrible idea, but a lot of people do it anyway).

So basically, you should take the connect call on the client as an indication that the connection request was sent, not that it was accepted.

miguelgrinberg avatar Apr 14 '20 15:04 miguelgrinberg

@miguelgrinberg your assumption is incorrect, this is not a bug in the client, but a bug in python-socketio server.

Scenario:

  • using Node.JS server
  • using Python-SocketIO client
  • In Python-SocketIO, connect to two namespaces using the same connection.

socket.io in JS will produce two separate sid for this client:

XV5Ag-cZgIgRbhegAAAB
/my-namespace#XV5Ag-cZgIgRbhegAAAB

Even though it is still the same connection multiplex, the fact that you have two separate sids, allows the server to put the different sid (eg, different namespaces) into different rooms.

Right now in Python SocketIO, because both the global / namespace and any other namespace reuses the same sid, you lost that ability.

simingy avatar Apr 15 '20 15:04 simingy

this is not a bug in the client

I didn't say there was any bug. In the JS client you are issuing two separate connections. In the Python client you are issuing just one. The Socket.IO protocol allows multiplexing of multiple namespaces in the same Engine.IO connection. I'm taking advantage of that feature. If you don't like to do that, then make two separate connections, like you did in JS.

Even though it is still the same connection multiplex, the fact that you have two separate sids

That makes no sense. How can my Python client take two different sids on the same connection? It's not possible because I never coded the client to work in that way, there is only one sio.sid value stored per client instance. If you have two different sids, then you are using two connections.

And if you want more proof, here is logs from the node server when it receives a connection on / and /foo:

  socket.io:server initializing namespace / +0ms
  socket.io-parser encoding packet {"type":0,"nsp":"/"} +0ms
  socket.io-parser encoded {"type":0,"nsp":"/"} as 0 +0ms
  socket.io:server creating engine.io instance with opts {"path":"/socket.io","initialPacket":["0"]} +1ms
  socket.io:server attaching client serving req handler +8ms
  socket.io:server initializing namespace /foo +2ms
  engine intercepting request for path "/socket.io/" +0ms
  engine handling "GET" http request "/socket.io/?transport=polling&EIO=3&t=1586971261.494227" +0ms
  engine handshaking client "Nzv9Sd-EmE0JYZCOAAAA" +3ms
  engine:socket sending packet "open" ({"sid":"Nzv9Sd-EmE0JYZCOAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}
) +0ms
  engine:socket sending packet "message" ([ '0' ]) +0ms
  engine:polling setting request +0ms
  engine:socket flushing buffer to transport +2ms
  engine:polling writing "96:0{"sid":"Nzv9Sd-EmE0JYZCOAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40" +0m
s
  engine:socket executing batch send callback +2ms
  socket.io:server incoming connection with id Nzv9Sd-EmE0JYZCOAAAA +11s
  socket.io:client connecting to namespace / +0ms
  socket.io:namespace adding socket to nsp / +0ms
  socket.io:socket socket connected - writing packet +0ms
  socket.io:socket joining room Nzv9Sd-EmE0JYZCOAAAA +0ms
  socket.io:socket joined room [ 'Nzv9Sd-EmE0JYZCOAAAA' ] +0ms
  engine:ws received "40/foo" +1ms
  socket.io-parser decoded 0/foo as {"type":0,"nsp":"/foo"} +1ms
  socket.io:client connecting to namespace /foo +7ms
  socket.io:namespace adding socket to nsp /foo +9ms
  socket.io:socket joining room /foo#Nzv9Sd-EmE0JYZCOAAAA +0ms
  socket.io:socket joined room [ '/foo#Nzv9Sd-EmE0JYZCOAAAA' ] +1ms

So there you go. There is a single sid mentioned in this log (Nzv9Sd-EmE0JYZCOAAAA) even though two namespaces are connected.

miguelgrinberg avatar Apr 15 '20 17:04 miguelgrinberg

when you issue two connections in JS, even though it looks like two sid, it's the same connection multiplex.

in your snippet, look at the SID Nzv9Sd-EmE0JYZCOAAAA and /foo#Nzv9Sd-EmE0JYZCOAAAA

on the server side that's two sid, but it's the exact same socket connection.

or maybe we could rephase - in js

    var socket = io.connect('http://localhost:8080');
    var socket_ns = io.connect('http://localhost:8080/my-namespace');

though it's two socket object, it's one engineio connection.

but again you are still focusing on the client.

using the same python client to connect to two different server:

  • on node server you get 2 sid, one per namespace
  • on python server you get 1 sid, same for both namespace

this is on the server side sid, not client.

simingy avatar Apr 15 '20 18:04 simingy

@sillygod you are looking at room names, not sids. The sid is Nzv9Sd-EmE0JYZCOAAAA for both connections.

miguelgrinberg avatar Apr 15 '20 18:04 miguelgrinberg

@miguelgrinberg are you sure...

Node Code:

var app = require('express')();
var http = require('http').createServer(app);
var io = require('socket.io')(http);

app.get('/', function(req, res){
  res.sendFile(__dirname + '/index.html');
});

http.listen(3000, function(){
  console.log('listening on *:3000');
});

io.on('connection', function(socket){
    console.log('user connected to /');
    console.log(socket.id);
});

io.of('/namespace').on('connection', function(socket){
  console.log('user connected to /namespace');
  console.log(socket.id);
});

Node Output:

node node_server.js
listening on *:3000
user connected to /
iht-er6SqXRMCJ3UAAAA
user connected to /namespace
/namespace#iht-er6SqXRMCJ3UAAAA

Client Python

import asyncio
import socketio

loop = asyncio.get_event_loop()
sio = socketio.AsyncClient()


@sio.event
async def connect():
    print('connection established')

async def start_server():
    await sio.connect('http://localhost:3000', transports=['websocket'],
                       namespaces=['/','/namespace'])
    await sio.wait()

if __name__ == '__main__':
    loop.run_until_complete(start_server())

Python output

DEBUG:asyncio:Using selector: KqueueSelector
INFO:engineio.client:Attempting WebSocket connection to ws://localhost:3000/socket.io/?transport=websocket&EIO=3
INFO:engineio.client:WebSocket connection accepted with {'sid': 'iht-er6SqXRMCJ3UAAAA', 'upgrades': [], 'pingInterval': 25000, 'pingTimeout': 5000}
INFO:socketio.client:Engine.IO connection established
INFO:engineio.client:Sending packet PING data None
INFO:engineio.client:Received packet MESSAGE data 0
INFO:socketio.client:Namespace / is connected
connection established
INFO:engineio.client:Sending packet MESSAGE data 0/namespace
INFO:engineio.client:Received packet PONG data None
INFO:engineio.client:Received packet MESSAGE data 0/namespace,
INFO:socketio.client:Namespace /namespace is connected

simingy avatar Apr 15 '20 19:04 simingy

are you sure...

Of course I'm sure.

You are looking at two server implementations of the Socket.IO protocol. They have minor variations in how they do things, but both comply with the protocol with regards to multiplexing of multiple namespaces within a single transport connection.

What the Socket.IO protocol calls sid is not the same as this socket.id variable that you are printing in the JS server. Here is how this variable is assigned:

https://github.com/socketio/socket.io/blob/47161a65d40c2587535de750ac4c7d448e5842ba/lib/socket.js#L64

You see what they do? They concatenate the name of the namespace with the value of client.id, which is the unique sid for this client, same for all namespaces.

Nothing prevents you from creating an id variable in the Python server that does the same thing if that helps you in any way. The Python server does not follow the same design as the JS one, I coded the server against the protocol specification, not to be a clone of the JS server.

miguelgrinberg avatar Apr 15 '20 22:04 miguelgrinberg

Is this still correct? when using the above code from the issue opener and doing that with the 5.9.0 version i see two different ids

connected CFMu8Ja_484ey2GDAAAB 1695235567.9777846
connected /abc G2I8LQfUB4ckCf6iAAAC 1695235567.9780757
CFMu8Ja_484ey2GDAAAB 12345
abc G2I8LQfUB4ckCf6iAAAC 12345

julianhille avatar Sep 20 '23 18:09 julianhille

ah ok reread the documentation its not true anymore every namespace has its own sid. Is there someway to remove that? i like to have the user authenticated only once and have the same sid for the same connection.

julianhille avatar Sep 20 '23 18:09 julianhille

@julianhille The Socket.IO protocol now requires a different sid per namespace. But you can make multiple sid values to a user_id from your user database, for example, and then you would have a way to associate all the connections of a user.

miguelgrinberg avatar Sep 20 '23 18:09 miguelgrinberg

The reasoning behind that is I want only one user context in the client manager. So I guess I extend the client manager and add a mapping to retrieve a single user context. Right?

I guess this ticket can be closed then.

julianhille avatar Sep 20 '23 19:09 julianhille

Btw this opens up a race condition for me, because my auth works through jwt with a specific lifetime and then namespace connection 1 to n might auth but the n+1 might fail due to reaching the end of the valid until time.

julianhille avatar Sep 20 '23 19:09 julianhille

@julianhille My opinion is that you are misusing namespaces. Namespaces are designed as independent connections that are multiplexed over a single transport. If you need a single connection on which several topics are exchanged, then use a single namespace and pass a topic field or something similar with every event.

miguelgrinberg avatar Sep 20 '23 21:09 miguelgrinberg

For anyone reading this issue, and to avoid any confusion regarding statements made here that are out of date, let me summarize the current state of things:

  • Current versions of the Socket.IO protocol and this package use different sid values for each namespace a client is connected to. It is only older versions that used the same sid for all namespaces.
  • Namespaces are designed to each carry an independent connection from a logical point of view. In terms of implementation, all these namespaces are multiplexed over a single Engine.IO connection.
  • The client's connect() method in current versions of this package waits for all requested namespaces to be connected before returning.

Given than what most of what is discussed here relates to old versions, I'm going to close this issue.

miguelgrinberg avatar Sep 21 '23 11:09 miguelgrinberg