Segmentation fault
Hi
I'm using squeeze REL1_6, and I have an issue on some server
All servers running postgresql v14.11
All servers have the same pg_squeeze version
pg_squeeze configuration in postgresql.conf is the same on all servers. and the same table is added to squeeze.tables table to be squeezed automatically.
I have two issues now:
- on some server squeeze never start automatically (I checked pg_stat_statement and squeeze worker does exist, see screenshot)
- if I run
squeeze.squeeze_tablemanually to squeeze the table, it causes Segmentation fault (before I run squeeze.squeeze_table, I stop the worker by running squeeze.stop_worker()) It's weird because I get Segmentation fault only for this table. when I runsqueeze.squeeze_tablefor other tables it works as expected. I tried to drop and recreate the table, but still same issue. This table is used frequently in the database
I attach the postgresql logs for Segmentation fault.
"logical decoding found initial starting point at 4FCC/E827FD48","Waiting for transactions (approximately 10) older than 725779050 to end.",,,,,,,,"","squeeze worker"
"logical decoding found initial consistent point at 4FCD/16D3C3E0","Waiting for transactions (approximately 6) older than 725781876 to end.",,,,,,,,"","squeeze worker"
"logical decoding found consistent point at 4FCD/244960A0","There are no old transactions anymore.",,,,,,,,"","squeeze worker"
"starting logical decoding for slot ""pg_squeeze_slot_16401_71778""","Streaming transactions committing after 4FCD/244960E0, reading WAL from 4FCC/E8175020.",,,,,,,,"","squeeze worker"
"invalid memory alloc request size 14888425372",,,,,,,,,"","squeeze worker"
"background worker ""squeeze worker"" (PID 71778) was terminated by signal 11: Segmentation fault","Failed process was running: INSERT INTO squeeze.errors(tabschema, tabname, sql_state, err_msg, err_detail) VALUES ('fleet', 'terminal_status', 'XX000', 'invalid memory alloc request size 14888425372', '')"
Could you please try to come up with a way to reproduce this bug starting with a clean cluster? If you cannot find the reason, then please attach a stack trace.
Right, the stack trace would be useful.
What I find weird is that in PG 14, the message
"starting logical decoding for slot ""pg_squeeze_slot_16401_71778""",
is printed out by CreateDecodingContext(), but pg_squeeze v16 does not call this function. Are you sure you are using pg_squeeze REL1_6?
Hi.
It happened again, and I attempted to collect GDB logs. Please find the attached file.
Regarding the pg_squeeze version, I have to confirm that yes, I installed REL1_6
Unfortunately the version number has not been updated in the master branch, so the pg_extension catalog shows version version 1.6 even for the master branch. Please check which branch you have checked out from the repository. (I think it's master.)
Regarding the log, it does not mention the "segmentation fault" (SIGSEGV) error.
Do you happen to find the core file (e.g. postgres.core) in your data directory? If not, please tell me which operating system you're using. If you do see it, please try to get the stack trace from the core file using gdb according to https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Debugging_the_core_dump_-_example
Unfortunately, core dump was not active on the server. Regarding the version, I selected REL1_6 and then downloaded it. I'll try to enable it and if it happen will share the core dump file
Please do not share the core dump - it's huge and might contain some data of your database (possibly confidential). I'm only interested in the backttrace. I can assist in getting it from the dump, if needed.
I'm still looking at debuglog1.txt that you provided earlier. Some backtraces in there look quite weird.
Have you built the binary from source? And if so, did you always run make clean before building a different branch? I wonder if object files of different branches got mixed up somehow ...
Yes, I built it from source, but I didn't run make clean before building a different branch. I just delete pg_squeeze.so from PostgreSQL lib directory, then built a different branch. I installed "master" branch, then I got a segmentation fault, then removed "squeeze.so" and compiled "REL1_6", but again I faced a segmentation fault. after I downgrade to REL1_5 it starts working. with "master" and "REL1_6", a segmentation fault is not the only issue (it's the biggest one as it sends the database to recovery mode), sometimes I also faced with the following errors 1- "initial slot snapshot too large" (I received this error on almost all my servers) 2- "invalid memory alloc request size xxxxxxxx" (for example "invalid memory alloc request size 17209330808", while the bloated table size is much less than this number, I don't know why squeeze needs this amount of memory to squeeze a tiny table) 3- "Unexpected number of TOAST indexes" 4- "all replication slots are in use" (sometimes squeeze don't delete the created replication slot)
It seems like too many problems unrelated to one another. I still suspect that the binary (pg_squeeze.so) is broken. To rule this out, can you please try to install REL1_6 from the community repository (https://www.postgresql.org/download/) ?
Also, if you still have the library that you built from source, I'd be interested in the output of nm pg_squeeze.so
Thanks
nm.log Please find attached the log file, output of nm command
Thanks. I'm not seeing an obvious problem there. No idea what else I can do without the core dump.
https://github.com/cybertec-postgresql/pg_squeeze/issues/71#issuecomment-2331352960
@ramkly, re-open if you are still interested.