pharo
pharo copied to clipboard
Opening a socket with fd>=1024 crashes the VM
Bug description The VM can crash with an access violation on linux when a socket is opened with a file descriptor >= 1024.
To Reproduce
aio.c
stores the handlers for async io in three arrays:
static aioHandler rdHandler[FD_SETSIZE];
static aioHandler wrHandler[FD_SETSIZE];
static aioHandler exHandler[FD_SETSIZE];
FD_SETSIZE
is typically 1024, so when a socket with an fd of 1024 or greater is opened and a read operation performed, the read handler is incorrectly written in to wrHandler
. For fd=1024 this corresponds to stdout (fd=0 in wrHandler
), so the next time stdout IO is performed and aioPoll()
is called the VM will crash as the socket dataHandler()
will be incorrectly called.
Expected behavior All IO completes successfully and the VM doesn't crash.
Screenshots The gdb session describing the crash is included below.
Version information:
- OS: Ubuntu
- Version: 22.04
- Pharo Version: 10
Expected development cost
https://access.redhat.com/solutions/488623 suggests using epoll/libevents over select().
The opensmalltalk VM already has this change incorporated in to AIO, https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/1bb96795fca099baedd8f45e59c847786aae91d6/platforms/unix/vm/aio.c#L730 which would provide a head-start to implementing the change.
Additionally, if epoll()
isn't available, an error should be raised if an attempt is made to poll an fd >= 1024.
Additional context
The test harness we set up has Pharo acting as a server and connecting 1500 clients to it, responding to requests and logging activity to the console.
Running Pharo under gdb and triggering the issue results in:
Thread 1 "GlamorousToolki" received signal SIGSEGV, Segmentation fault.
0x00007ffff240c3e8 in dataHandler (fd=0, data=0x0, flags=4) at extracted/plugins/SocketPlugin/src/common/SocketPluginImpl.c:591
591 logTrace("dataHandler(%d=%d, %p, %d)\n", fd, pss->s, data, flags);
(gdb)
(gdb) bt
#0 0x00007ffff240c3e8 in dataHandler (fd=0, data=0x0, flags=4)
at extracted/plugins/SocketPlugin/src/common/SocketPluginImpl.c:591
#1 0x00007ffff712235b in aio_handle_events (microSeconds=50000)
at extracted/vm/src/unix/aio.c:314
#2 0x00007ffff7121cde in aioPoll (microSeconds=50000)
at extracted/vm/src/unix/aio.c:216
#3 0x00007ffff711ce25 in ioRelinquishProcessorForMicroseconds (microSeconds=50000)
at extracted/vm/src/common/heartbeat.c:307
#4 0x00007ffff70effd1 in primitiveRelinquishProcessor ()
at generated/64/vm/src/cointerp.c:35534
#5 0x000000002c789c76 in ()
#6 0x00007ffff5701640 in ()
#7 0x000000002de5b190 in ()
#8 0x0000000100000000 in ()
#9 0x000000002e163a50 in ()
#10 0x00007fffffff6180 in ()
#11 0x00007ffff70720e0 in interpret ()
at generated/64/vm/src/cointerp.c:2978
(gdb) list
586 /* read or write data transfer is now possible for the socket. */
587
588 static void dataHandler(int fd, void *data, int flags)
589 {
590 privateSocketStruct *pss= (privateSocketStruct *)data;
591 logTrace("dataHandler(%d=%d, %p, %d)\n", fd, pss->s, data, flags);
592
593 if (pss == NULL)
594 {
595 logTrace("dataHandler: pss is NULL fd=%d data=%p flags=0x%x\n", fd, data, flags);
(gdb) print data
$1 = (void *) 0x0
(gdb) print pss
$2 = (privateSocketStruct *) 0x0
(gdb) up
#1 0x00007ffff712235b in aio_handle_events (microSeconds=50000)
at extracted/vm/src/unix/aio.c:314
314 handler(fd, clientData[fd], AIO_W);
(gdb) list
309 }
310 //_DO(AIO_W, wr)
311 if (FD_ISSET(fd, &wr)) {
312 handler = wrHandler[fd];
313 FD_CLR(fd, &wrMask);
314 handler(fd, clientData[fd], AIO_W);
315 wrHandler[fd]= undefinedHandler;
316 }
317 //_DO(AIO_X, ex)
318 if (FD_ISSET(fd, &ex)) {
(gdb) print fd
$3 = 0
(gdb) print clientData[fd]
$4 = (void *) 0x0
(gdb) print wrHandler[fd]
$5 = (aioHandler) 0x7ffff240c3a0 <dataHandler>
Thanks alistair. Pablo is about to recover the work he did on socket and this will probably go also in the socket improvement todo. @tesonep
Great to hear, thanks!
Fixed in https://github.com/pharo-project/pharo-vm/pull/805
This is great!!! @tesonep is that fix only for sockets or does it affect files as well? You remember when I complained about the usage of select() there? Thanks @akgrant43 for the fix
Fixed by: https://github.com/pharo-project/pharo-vm/pull/805