sql-parser icon indicating copy to clipboard operation
sql-parser copied to clipboard

Heap corruption when parsing SQL off the main thread using MinGW64 on Windows

Open ygqrc opened this issue 9 months ago • 4 comments

Hello,

I'm currently developing on Windows using MinGW64 as my toolchain to compile this library.

During testing, I encountered a thread-related crash when executing SQL parsing in a background thread. Here's what I observed:

✅ If I parse SQL on the main thread, everything works fine. like :

int main() { for(int i =0;i<100;i++ ) { const char * sql ="INSERT INTO altable (name) values ('1') ; "; hsql::SQLParserResult result; hsql::SQLParser::parser(sql,&result); } return 0; }

❌However, if I perform SQL parsing off the main thread, I occasionally experience unpredictable errors. like :

int main() { std::thread _t ([=](){ for(int i =0;i<100;i++ ) { const char * sql ="INSERT INTO altable (name) values ('1') ; "; hsql::SQLParserResult result; hsql::SQLParser::parser(sql,&result); } return 0; }}); _t.join(); } If I perform the parsing off the main thread, I encounter random crashes, most commonly:

0xc0000374: Heap corruption Stack trace usually points to libstdc++6.dll, particularly in malloc or free Sometimes, the crash is: 0xc0000005: Access violation

🔍 Further Investigation

I also noticed that the crash only happens with certain SQL statements— specifically when string values are included in the SQL. For example: ✅ This does not crash:

INSERT INTO table1 (id) VALUES (1);

❌ This will likely crash (off main thread): INSERT INTO table1 (name) VALUES ('111'); It seems the crash is triggered when the parser tries to handle string fields(e.g., '111' as a name), whereas numeric values like integers or doubles do not cause issues.

Tested with MSVC

When I switch from MinGW64 to MSVC as the compiler: The issue completely disappears, even under the same test scenarios (multi-threaded SQL parsing with string values).

Question Do you have any insight into why this behavior occurs?

Could it be a compatibility issue between MinGW’s libstdc++ and the memory allocation used during string parsing? Or perhaps an issue with thread-safety or exception handling in MinGW’s C++ runtime?

Any help or advice would be greatly appreciated!

Thank you in advance 🙏

ygqrc avatar Apr 09 '25 01:04 ygqrc

Hello ygqrc,

Thanks for making us aware of this issue. Right now, I don't have an explanation for the failures when using threads. Unfortunately, none of the active developers has access to a Windows machine to test this issue.

Is it possible to run your code in either a linux environment or with sanitizers?

Bouncner avatar Apr 09 '25 08:04 Bouncner

Yes, I can run the code in a Linux environment, and it does not exhibit the issue there. On Linux, I'm using GCC as the compiler.

On Windows, I have tested with multiple toolchains, including:

the MinGW-w64 toolchain that comes with MSYS2,

the Clang toolchain, and

Microsoft's official MSVC toolchain.

The issue only occurs when using the MinGW-w64 toolchain on Windows — in this case, running SQL parsing outside of the main thread causes memory corruption and instability.

However, using GCC on Linux, I don't encounter any such heap corruption or crashes, even with multithreaded parsing.

It's unfortunate that none of the active developers are working on Windows, since this issue has been puzzling me for quite a while.

ygqrc avatar Apr 09 '25 09:04 ygqrc

As it runs with all "native" tool chains it does not sound to me like a Windows issue but rather an issue with MSYS2. There have been plenty of threading issues in their GitHub repository. Maybe it's something to raise there? At least when none of the sanitizers finds issues.

Bouncner avatar Apr 09 '25 10:04 Bouncner

Thank you, your point makes a lot of sense.

At first, I suspected the issue was related to flex_lexer.l or bison_parser.y , since they are generated using C-style code and make use of functions like strcat, etc.

My first attempt to fix it was replacing those with safer alternatives , but it didn’t solve the problem.

Later on, I went further and replaced all string operations in the generated code with C++ std::string , hoping that would help — but unfortunately, the issue still persisted. After reading your response, I started to consider new possibilities: 1.The versions of Flex and Bison I used are those provided by MSYS2 — could they be the cause? (Though I’m not sure, since I was actually using .cpp code generated on Linux in my Windows environment.) 2.It might be an issue with MSYS2 itself, or how it interacts with the Windows platform. Thank you again for pointing me in this direction!

ygqrc avatar Apr 09 '25 12:04 ygqrc