pg_squeeze icon indicating copy to clipboard operation
pg_squeeze copied to clipboard

Segmentation fault

Open ramkly opened this issue 1 year ago • 11 comments

Hi I'm using squeeze REL1_6, and I have an issue on some server All servers running postgresql v14.11 All servers have the same pg_squeeze version pg_squeeze configuration in postgresql.conf is the same on all servers. and the same table is added to squeeze.tables table to be squeezed automatically. I have two issues now:

  1. on some server squeeze never start automatically (I checked pg_stat_statement and squeeze worker does exist, see screenshot) image
  2. if I run squeeze.squeeze_table manually to squeeze the table, it causes Segmentation fault (before I run squeeze.squeeze_table, I stop the worker by running squeeze.stop_worker()) It's weird because I get Segmentation fault only for this table. when I run squeeze.squeeze_table for other tables it works as expected. I tried to drop and recreate the table, but still same issue. This table is used frequently in the database

I attach the postgresql logs for Segmentation fault.

"logical decoding found initial starting point at 4FCC/E827FD48","Waiting for transactions (approximately 10) older than 725779050 to end.",,,,,,,,"","squeeze worker"
"logical decoding found initial consistent point at 4FCD/16D3C3E0","Waiting for transactions (approximately 6) older than 725781876 to end.",,,,,,,,"","squeeze worker"
"logical decoding found consistent point at 4FCD/244960A0","There are no old transactions anymore.",,,,,,,,"","squeeze worker"
"starting logical decoding for slot ""pg_squeeze_slot_16401_71778""","Streaming transactions committing after 4FCD/244960E0, reading WAL from 4FCC/E8175020.",,,,,,,,"","squeeze worker"
"invalid memory alloc request size 14888425372",,,,,,,,,"","squeeze worker"
"background worker ""squeeze worker"" (PID 71778) was terminated by signal 11: Segmentation fault","Failed process was running: INSERT INTO squeeze.errors(tabschema, tabname, sql_state, err_msg, err_detail) VALUES ('fleet', 'terminal_status', 'XX000', 'invalid memory alloc request size 14888425372', '')"

ramkly avatar Mar 14 '24 04:03 ramkly

Could you please try to come up with a way to reproduce this bug starting with a clean cluster? If you cannot find the reason, then please attach a stack trace.

kovmir avatar Mar 14 '24 04:03 kovmir

Right, the stack trace would be useful.

What I find weird is that in PG 14, the message "starting logical decoding for slot ""pg_squeeze_slot_16401_71778""", is printed out by CreateDecodingContext(), but pg_squeeze v16 does not call this function. Are you sure you are using pg_squeeze REL1_6?

ahouska avatar Mar 14 '24 13:03 ahouska

Hi. It happened again, and I attempted to collect GDB logs. Please find the attached file. Regarding the pg_squeeze version, I have to confirm that yes, I installed REL1_6 image

debuglog1.txt

ramkly avatar Mar 16 '24 05:03 ramkly

Unfortunately the version number has not been updated in the master branch, so the pg_extension catalog shows version version 1.6 even for the master branch. Please check which branch you have checked out from the repository. (I think it's master.)

Regarding the log, it does not mention the "segmentation fault" (SIGSEGV) error.

Do you happen to find the core file (e.g. postgres.core) in your data directory? If not, please tell me which operating system you're using. If you do see it, please try to get the stack trace from the core file using gdb according to https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Debugging_the_core_dump_-_example

ahouska avatar Mar 16 '24 08:03 ahouska

Unfortunately, core dump was not active on the server. Regarding the version, I selected REL1_6 and then downloaded it. I'll try to enable it and if it happen will share the core dump file

ramkly avatar Mar 16 '24 08:03 ramkly

Please do not share the core dump - it's huge and might contain some data of your database (possibly confidential). I'm only interested in the backttrace. I can assist in getting it from the dump, if needed.

ahouska avatar Mar 16 '24 10:03 ahouska

I'm still looking at debuglog1.txt that you provided earlier. Some backtraces in there look quite weird.

Have you built the binary from source? And if so, did you always run make clean before building a different branch? I wonder if object files of different branches got mixed up somehow ...

ahouska avatar Apr 30 '24 12:04 ahouska

Yes, I built it from source, but I didn't run make clean before building a different branch. I just delete pg_squeeze.so from PostgreSQL lib directory, then built a different branch. I installed "master" branch, then I got a segmentation fault, then removed "squeeze.so" and compiled "REL1_6", but again I faced a segmentation fault. after I downgrade to REL1_5 it starts working. with "master" and "REL1_6", a segmentation fault is not the only issue (it's the biggest one as it sends the database to recovery mode), sometimes I also faced with the following errors 1- "initial slot snapshot too large" (I received this error on almost all my servers) 2- "invalid memory alloc request size xxxxxxxx" (for example "invalid memory alloc request size 17209330808", while the bloated table size is much less than this number, I don't know why squeeze needs this amount of memory to squeeze a tiny table) 3- "Unexpected number of TOAST indexes" 4- "all replication slots are in use" (sometimes squeeze don't delete the created replication slot)

ramkly avatar May 01 '24 04:05 ramkly

It seems like too many problems unrelated to one another. I still suspect that the binary (pg_squeeze.so) is broken. To rule this out, can you please try to install REL1_6 from the community repository (https://www.postgresql.org/download/) ?

Also, if you still have the library that you built from source, I'd be interested in the output of nm pg_squeeze.so

Thanks

ahouska avatar May 01 '24 06:05 ahouska

nm.log Please find attached the log file, output of nm command

ramkly avatar May 01 '24 08:05 ramkly

Thanks. I'm not seeing an obvious problem there. No idea what else I can do without the core dump.

ahouska avatar May 02 '24 14:05 ahouska

https://github.com/cybertec-postgresql/pg_squeeze/issues/71#issuecomment-2331352960

kovmir avatar Sep 05 '24 12:09 kovmir

@ramkly, re-open if you are still interested.

kovmir avatar Sep 20 '24 11:09 kovmir