neon
neon copied to clipboard
`plv8` extension creation crashes postgres sometimes
I've managed to do this two times. The rough algorithm is following:
- Create a new staging project
- Run
create extension postgis; - Run
create extension plv8;
With some probability step 3 will result in a crash.
My try 1, late-surf-318322:
https://observer.zenith.tech/d/IJSLHBOnk/neon-logs-compute-nodes?orgId=1&var-environment=staging&var-region=virginia&var-project=compute-node-late-surf-318322&var-search=&var-exclude=substing%20to%20exclude&from=1661776289601&to=1661777410709
My try 2, silent-union-873343:
https://observer.zenith.tech/d/IJSLHBOnk/neon-logs-compute-nodes?orgId=1&var-environment=staging&var-region=virginia&var-project=compute-node-silent-union-873343&var-search=&var-exclude=substing%20to%20exclude&from=1661806532323&to=1661806628863
In both cases it was with the following log messages:
2022-08-29 23:56:05
compute-node-silent-union-873343 2022-08-29 20:56:05.502 GMT [14] LOG: server process (PID 33) was terminated by signal 7: Bus error
2022-08-29 23:56:05
compute-node-silent-union-873343 2022-08-29 20:56:05.502 GMT [14] DETAIL: Failed process was running: create extension plv8;
My tests indicate we crash persistently when I execute the following (composite) statement:
CREATE EXTENSION postgis; CREATE EXTENSION plv8;
I've reproduced it with
$ CREATE EXTENSION plv8;
$ CREATE EXTENSION postgis; -- no error yet
$ CREATE OR REPLACE FUNCTION test ()
> RETURNS int
> IMMUTABLE
> LANGUAGE plv8
>AS $$ return "test".length + 1|0; $$;
<server closed the connection unexpectedly
< This probably means the server terminated abnormally
< before or while processing the request.
So it seems like it goes wrong while creating a new PLv8 function after installing PostGIS (which is something that PLv8 also does during it's installation procedure).
I've reproduced it with
Locally? Is there a backtrace?
I haven't yet been able to get this image running locally, so this was in staging
And that was with PLv8/3.1.4 too, so it's not something that's fixed with that release
Did this really get fixed by PR #2366? I think Github intepreted "this may or may not fix #2361" as just "fix #2361", and closed this issue automatically :-).
well, we won't see PLV8 crash postgres anymore considering that we don't include it in the final image. But, indeed, I haven't yet found the origin of the issue, nor fixed the true underlying issue.
are you working on this, @SergeyMelnikov ? were you trying to reproduce this?
Just set default k8s limits on staging to 1 to 4 ratio (m6i.2xlarge instance) -- 1 cpu and 4 gigs of ram. So we can try to enable plv8 again and check whether it was an OOM or not
we'll increase the amount of memory, deploy on staging and check if this still fails
are we still working on this?
Yes, we found bug in plv8 itself https://github.com/plv8/plv8/pull/504