bee.lua icon indicating copy to clipboard operation
bee.lua copied to clipboard

Tests fail on FreeBSD with SIGBUS

Open michaeladler opened this issue 1 year ago • 8 comments

I've tried to run the tests on FreeBSD but they fail since they are killed by SIGBUS. There seems to be a problem with creating the file watcher from Lua. This is the line which triggers the SIGBUS: https://github.com/actboy168/bee.lua/blob/a73df876cd6418496fa2d7f423e403487b2d115a/binding/lua_filewatch.cpp#L132

I've also added some debug code to the constructor but it seems like the constructor is never executed.

Usually I wouldn't mind about the tests failing but lua-language-server uses bee's filewatcher and is killed by the same SIGBUS, so this issue is actually the root cause.

I don't have a solution for this problem so I'm creating this issue in the hope that someone else has an idea :) Maybe a Lua bug? I tried to update the bundled Lua to 5.4.6 but it still fails.

EDIT: Seems to be related to https://github.com/actboy168/luamake/issues/33

michaeladler avatar Aug 25 '23 11:08 michaeladler

filewatch on freebsd is incomplete. filewatch depends on libinotify-kqueue, but unfortunately libinotify-kqueue doesn't work correctly.

actboy168 avatar Aug 28 '23 02:08 actboy168

It may be incomplete, but I think that doesn't explain the crash. For example, the following C++ program using bee's C++ library works fine and prints the events:

#include <chrono>
#include <iostream>
#include <thread>

#include "filewatch.h"

int main(void)
{
    bee::filewatch::watch watch;
    watch.set_recursive(true);
    watch.set_follow_symlinks(true);
    watch.add("/tmp/foo");

    for (int i = 1; i <= 10; ++i) {
        watch.update();
        while (true) {
            auto event = watch.select();
            if (event.has_value()) {
                std::cout << "event for: " << event.value().path << std::endl;
            } else {
                std::cout << "no events" << std::endl;
                break;
            }
        }
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }

    return 0;
}

This makes me think it's actually an issue with Lua.

michaeladler avatar Aug 28 '23 09:08 michaeladler

I don't have a test environment, you can try to fix it.

actboy168 avatar Aug 29 '23 00:08 actboy168

The problem is with the newudata binding, used to allocate a file watcher object. _Alignof(bee::filewatch::watch) == 16, and the constructor relies on this property:

(gdb) disas Dump of assembler code for function _ZN3bee9filewatch5watchC2Ev: 0x0000000000452470 <+0>: push %rbp 0x0000000000452471 <+1>: mov %rsp,%rbp 0x0000000000452474 <+4>: push %rbx 0x0000000000452475 <+5>: push %rax 0x0000000000452476 <+6>: mov %rdi,%rbx 0x0000000000452479 <+9>: xorps %xmm0,%xmm0 => 0x000000000045247c <+12>: movaps %xmm0,0x20(%rdi) <-- assumes %rdi is 16-bit aligned 0x0000000000452480 <+16>: movaps %xmm0,0x10(%rdi)

In particular, it's using SIMD instructions to zero the object, and this assumes 16-byte alignment. However, the value in %rdi is only 8-byte aligned.

The object was allocated with:

217     template <typename T, typename... Args>                                                                                                                                                                                                                                                                               
218     T& newudata(lua_State* L, void (*init_metatable)(lua_State*), Args&&... args) {                                                                                                                                                                                                                                       
219         static_assert(udata_has_name<T>::value);                                                                                                                                                                                                                                                                          
220         int nupvalue = 0;                                                                                                                                                                                                                                                                                                 
221         if constexpr (udata_has_nupvalue<T>::value) {                                                                                                                                                                                                                                                                     
222             nupvalue = udata<T>::nupvalue;                                                                                                                                                                                                                                                                                
223         }                                                                                                                                                                                                                                                                                                                 
224         T* o = static_cast<T*>(lua_newuserdatauv(L, sizeof(T), nupvalue));                                                                                                                                                                                                                                                
...                                                                                                                                                                                                                                                                                        
234         return *o;                                                                                                                                                                                                                                                                                                        
235     }

i.e., we're taking some udata block from lua and stuffing an object into it. But this assumes that lua's udata allocator is returning sufficiently aligned memory, and that assumption is wrong. (Lua is using realloc() under the hood, which does return 16-byte aligned memory, but lua cuts off some of that for its own use, resulting in misalignment.)

To demonstrate, if I change the code to do this, then I get SIGABRT instead:

...
224         //T* o = static_cast<T*>(lua_newuserdatauv(L, sizeof(T), nupvalue));                                                                                                                                                                                                                                              
225         void *ret = lua_newuserdatauv(L, sizeof(T), nupvalue);                                                                                                                                                                                                                                                            
226         if (((uintptr_t)ret & (_Alignof(T) - 1)) != 0) abort();                                                                                                                                                                                                                                                           
227         T *o = static_cast<T*>(ret);
...

I'm not sure how best to fix it. I believe the bug is in bee: it's assuming too much of the underlying udata allocator, which it doesn't implement. Probably it should be storing a pointer in udata instead of the whole object.

markjdb avatar Jan 22 '24 02:01 markjdb

@markjdb I don't know why freebsd's clang assumes filewatch::watch is 16-byte aligned. Maybe using alignas will make clang aware that filewatch::watch is not 16-byte aligned.

actboy168 avatar Jan 22 '24 03:01 actboy168

@markjdb I don't know why freebsd's clang assumes filewatch::watch is 16-byte aligned. Maybe using alignas will make clang aware that filewatch::watch is not 16-byte aligned.

clang is what decides that the class is 16-byte aligned. It raises an error if I try to lower the alignment to 8, but doesn't say why. And the problem affects all classes which are used as udata with an odd number of user values.

markjdb avatar Jan 22 '24 05:01 markjdb

Well. You can change filewatch's uservalue to 2 if it makes clang work.

actboy168 avatar Jan 22 '24 06:01 actboy168

Well. You can change filewatch's uservalue to 2 if it makes clang work.

It turns out that the problem is there no matter how many uservalues there are. Since bee.lua provides its own lua runtime, I think it might be acceptable to force lua to provide required alignment. I submitted a PR to do that, please let me know if it's acceptable.

markjdb avatar Jan 27 '24 16:01 markjdb

Fixed.

actboy168 avatar Feb 13 '24 05:02 actboy168