Doubts related to dlopen and dlclose
Hey @sbc100,
I think you might be able to help me out here.
So clang-repl can do the undo operation to surpass redefinitions and stuff. For eg
clang-repl> int y = 50;
clang-repl> const char* y = "Hello World";
In file included from <<< inputs >>>:1:
input_line_6:1:13: error: redefinition of 'y' with a different type: 'const char *' vs 'int'
1 | const char* y = "Hello World";
| ^
input_line_5:1:5: note: previous definition is here
1 | int y = 50;
| ^
error: Parsing failed.
clang-repl> extern "C" int printf(const char*, ...);
clang-repl> int x = 50;
clang-repl> %undo
clang-repl> const char* x = "Hello World";
clang-repl> auto r = printf("%s\n", x);
Hello World
Now I was trying to extend the same functionality while we're running clang-repl in the browser.
So basically
- for executing we use AddModule : I think I might have explained the approach here in the past. We use
dlopenfor this. Each cell/code block gives us a side module which should be loaded up top of the main module using dlopen
code -> PTU -> LLVM IR -> wasm object -> wasm binary -> loaded on top of main module using dlopen
This is done here https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Wasm.cpp#L120-L126
- for undo we use removeModule : My approach here was as we load the module, we just need to keep track of the handle and unload it. I thought we can put
dlcloseto use for the same.
So as of now as removeModule was not implemented, but I thought of introducing this small diff for implementation
https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Wasm.cpp#L120-L126 (Has few debug logs too)
i) So I just keep track of the modules ii) locate the handle to the unloaded. iii) Unload using dlclose
Now what I see here, although now I can use undo and do the redefinition, the values a symbol is pointing too isn't being overwritten. Check this for eg
https://github.com/user-attachments/assets/5bd3481e-62d6-41bd-86fb-920811b5a470
I should have seen the value of y to be 50 after the undo (and the redefinition) and not 5
Is this not how dlclose is intended to work ? Not sure it guaranteed getting rid of all the symbols which were loaded/resolved. So not sure when we "unload" the module responsible for y = 5 ... it also removes anything and everything related to y. I am guessing there is still some relevance to y & 5 ?
Hence even after undo and now that I can redefine it .... it still points to 5 somehow.
Do you use dlclose on this way on native platforms? if so then this could be a bug in our implementation of dlclose. dlclose is not something that our users have so far asked about having good support for. Folks tend to load can and then use. Unloading code less commonly requested.
Yeah, I wasn't sure that the close support is as concrete as maybe dlopen !
So I thought the best way to discuss this is through an issue.
In my case, while running clang-repl in the browser... You can either load a shared object at runtime or possibly unload it (through something called as an undo operation on clang-repl)
So hopefully my ask here is justified and I am not expecting something out of place. ( If you see the video, dlclose does work, I've added debug logs to show the module being unloaded.... But the memory the var is pointing too isn't being overwritten )
I think the problem is maybe more like "symbols from subsequent dlopen() operations don't override existing global symbols from previous dlopen() calls". Does this work for you on native platforms? Are you using dlopen() / dlclose() there too?
Arghhh, I gave this a very quick attempt :/ I'll try it natively tomorrow. If you think something is obviously wrong, maybe you give it an attempt too and I can confirm it from my side.
Edit: hmm, yeah the way you describe what's happening does make sense. I think that's exactly what I was seeing!
You say in the title "clang-repl can do the undo operation to surpass redefinitions and stuff.".. I assume you mean the native clang-repl? My question is does the native clang-repl use dlopen / dlclose for each module, or is it some other mechanism?
I could do
- int x = 10;
- %undo
- int x = 20;
- std::cout << x // gave me 10 in return
- x = 30;
- std::cout << x // gave me 30 in return
The above involves 5 dlopens and 1 dlclose. The dlclose did allow me to do a redefinition (which errors out without the undo) so yes it's not that the module unloading isn't taking place but yeah some error for sure !
You say in the title "clang-repl can do the undo operation to surpass redefinitions and stuff.".. I assume you mean the native clang-repl? My question is does the native clang-repl use
dlopen/dlclosefor each module, or is it some other mechanism?
Ahh absolutely not. For the native case no, for the wasm case we're trying the dlopen/dlclose mechanism.
symbols from subsequent dlopen() operations don't override existing global symbols from previous dlopen() calls
I think this might be exactly what's happening. cc @sbc100 Here's what I tried out. Please let me know if this is a good enough way to test this
- create a file structure like this
test_undo/
├── CMakeLists.txt
├── lib1.cpp
├── lib2.cpp
└── main.cpp
- main.cpp
#include <iostream>
#include <dlfcn.h>
typedef int (*getXFunc)();
int main() {
// Load the first library (lib1)
void* handle1 = dlopen("./lib1.so", RTLD_NOW | RTLD_GLOBAL);
if (!handle1) {
std::cerr << "Failed to load lib1: " << dlerror() << std::endl;
return -1;
}
getXFunc getX = (getXFunc)dlsym(handle1, "getX");
if (!getX) {
std::cerr << "Failed to get symbol from lib1: " << dlerror() << std::endl;
return -1;
}
std::cout << "Initial x from lib1: " << getX() << std::endl; // Should print 10
// Unload the first library
if (dlclose(handle1) != 0) {
std::cerr << "Failed to unload lib1: " << dlerror() << std::endl;
return -1;
}
// Load the second library (lib2)
void* handle2 = dlopen("./lib2.so", RTLD_NOW | RTLD_GLOBAL);
if (!handle2) {
std::cerr << "Failed to load lib2: " << dlerror() << std::endl;
return -1;
}
getX = (getXFunc)dlsym(handle2, "getX");
if (!getX) {
std::cerr << "Failed to get symbol from lib2: " << dlerror() << std::endl;
return -1;
}
std::cout << "x after reload from lib2: " << getX() << std::endl; // Should print 20
dlclose(handle2);
return 0;
}
- lib1.cpp
int x = 10;
extern "C" int getX() {
return x;
}
- lib2.cpp
int x = 20;
extern "C" int getX() {
return x;
}
- Simple cmakelists.txt
cmake_minimum_required(VERSION 3.20)
project(test_undo)
# Set Emscripten as the toolchain
set(CMAKE_CXX_COMPILER em++)
# Enable MAIN_MODULE for the main program
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s MAIN_MODULE=1 -s WASM=1 -s EXPORTED_FUNCTIONS=['_main'] -s EXPORTED_RUNTIME_METHODS=['ccall','cwrap']")
# Specify the source file for the main module
add_executable(main main.cpp)
- Build the shared objects
anutosh491@Anutoshs-MacBook-Air test_undo % emcc -s SIDE_MODULE=1 -o lib1.so lib1.cpp
anutosh491@Anutoshs-MacBook-Air test_undo % emcc -s SIDE_MODULE=1 -o lib2.so lib2.cpp
- Build main.js/main.wasm
mkdir build
cd build
emcmake cmake ..
emmake make
- Execute through node
(xeus-cpp-wasm-build) anutosh491@Anutoshs-MacBook-Air build % node main.js
Initial x from lib1: 10
x after reload from lib2: 10
Here we should have expected 20 I suppose, isn't it ? I see 10 being printed
EDIT: I see the same result for the above, even if I comment out the dlclose relevant lines in main.cpp (basically having two subsequent dlopen calls both defining x through int x = ....)
Wait actually now I am curious if dlclose is actually even at fault here.
- Let's say I just comment out the
dlclosecode from above (so that we are left with 2 subsequentdlopencalls)
// Unload the first library
// if (dlclose(handle1) != 0) {
// std::cerr << "Failed to unload lib1: " << dlerror() << std::endl;
// return -1;
// }
- And I update lib2.cpp to have something like the following instead of a direct
int x = 20;
extern int x;
__attribute__((constructor))
void modifyX() {
x = 20; // Update the value of x
}
- And I sort of build the side modules inspired by what wasm.cpp was doing (just for some consistency with the issue I had raised)
anutosh491@Anutoshs-MacBook-Air test_undo % emcc lib1.cpp -s SIDE_MODULE=1 -s IMPORTED_MEMORY=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 -o lib1.wasm
anutosh491@Anutoshs-MacBook-Air test_undo % emcc lib2.cpp -s SIDE_MODULE=1 -s IMPORTED_MEMORY=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 -o lib2.wasm
- I now see this
anutosh491@Anutoshs-MacBook-Air build % node main.js
Initial x from lib1: 10
x after reload from lib2: 20
Having the dlclose code commented out or not doesn't make a difference now as it works in both cases. Which tells me this might not be a dlclose issue ? Maybe 2 consecutive defintions for x being loaded is the issue ? (But then this should be valid if we have a dlclose in between correct ?)
Cc @sbc100
Would be great if you could let me know what your thoughts are on the above ?
Cc @sbc100 gentle ping !
I don't think you can currently expect dlclose followed by dlopen to replace symbols in the global namespace. I think the first dll to provide I symbol will currently win and that symbol will remain for the life of the program.
If that native behaviour is different to this then I think we would consider a patch to make our behaviour match, in which case please open a bug for that.
I opened https://github.com/emscripten-core/emscripten/issues/23939 to make a note of the difference in behaviour from the native case. Let me know if that is what was expected !
The clang-repl use case in the browser where this is needed is being facilitated through this LLVM PR (https://github.com/llvm/llvm-project/pull/131558)
Opened as draft for now to understand how dlopen and dlclose should work together.