pljava
pljava copied to clipboard
ClassNotFoundException using PL/Java 1.6.8 on Postgres 17.2
We have been using PL/Java for a couple of years now and created our extension initially with 1.5.6 on Postgres 13. Over the years we have migrated to Postgres 15 then 16 and now 17.2. Until recently, we haven't had any serious issues. Our extension is fairly simple and provides XML and JSON validation services to SQL for generated data from business processes. We use install_jar to create entries in SQLJ for our extension JAR and GSON 2.10.1. We run set_classpath for each of our client schemas.
This has all worked well until 17.2 (using PL/Java 1.6.8). Now we are getting ClassNotFoundException on a static class deep within the extension inner workings.
Our SQL calls runPublishValidation(3 parameters) which uses PL/Java to call Publish.validatte(...). Within the extension Publish.validate creates an instance of the non-static class Validation and calls it's run() method. The run() method performs 10 distinct validation checks, think of them as steps. There are several static helper classes for XML and JSON handling, think of them as utility classes. These static utility classes are used extensively in each of the 10 steps.
What we are seeing is that on the 10th step, PL/Java is reporting ClassNotFoundException on Helpers.processSubstitution(). Again, this class method will have been called dozens of times prior to the 10th step invocation.
This is the call stack I get:
25 Mar 25 14:28:16 org.postgresql.pljava.sqlj.Loader Failed to load class
java.sql.SQLException: An attempt was made to call a PostgreSQL backend function after an elog(ERROR) had been issued
at [email protected]/org.postgresql.pljava.internal.Oid._forSqlType(Native Method)
at [email protected]/org.postgresql.pljava.internal.Oid.lambda$forSqlType$1(Oid.java:68)
at [email protected]/org.postgresql.pljava.internal.Backend.doInPG(Backend.java:89)
at [email protected]/org.postgresql.pljava.internal.Oid.forSqlType(Oid.java:68)
at [email protected]/org.postgresql.pljava.jdbc.SPIPreparedStatement.setObject(SPIPreparedStatement.java:238)
at [email protected]/org.postgresql.pljava.jdbc.SPIPreparedStatement.setObject(SPIPreparedStatement.java:223)
at [email protected]/org.postgresql.pljava.jdbc.SPIPreparedStatement.setInt(SPIPreparedStatement.java:124)
at [email protected]/org.postgresql.pljava.sqlj.Loader.lambda$findClass$5(Loader.java:462)
at [email protected]/org.postgresql.pljava.sqlj.Loader.findClass(Loader.java:464)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at schema:dimbuild_sprvl//com.iri.udb.pljava.publish.Validation.checkAttrSpecMeta(Validation.java:211)
at schema:dimbuild_sprvl//com.iri.udb.pljava.publish.Validation.run(Validation.java:159)
at schema:dimbuild_sprvl//com.iri.udb.pljava.Publish.validate(Publish.java:75)
at schema:dimbuild_sprvl//com.iri.udb.pljava.Publish.validate(Publish.java:48)
at [email protected]/org.postgresql.pljava.internal.EntryPoints.lambda$invocable$0(EntryPoints.java:130)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
at [email protected]/org.postgresql.pljava.internal.EntryPoints.doPrivilegedAndUnwrap(EntryPoints.java:312)
at [email protected]/org.postgresql.pljava.internal.EntryPoints.invoke(EntryPoints.java:158)
Exception mv_spec_extract_data ERROR: java.lang.NoClassDefFoundError: com/iri/udb/pljava/common/Helpers
We have tried many different approaches to narrow down what is happening from pljava setting changes in the postgresql.conf file to flatten away the static class into the Validation class implementation to use Class.forName to dynamically load the class. Nothing helped. Then we decided to add -Djava.class.path={our jars} to pljava.vmoptions and this seems to have corrected the problem, although introduces another problem for us. We run in many different environments, on many database servers and this will be a maintenance and setup nightmare for us. We like using the SQLJ tables to manage the distribution of the extension.
As this point, I will stop typing and let you digest (and/or figure out) what I've said and ask questions.
Goodness, that's a lot to digest, and it sounds frustrating.
The best thing about it, from your description, is that it seems to be deterministic and repeatable, you can count on seeing it on the tenth step.
On first impression, it looks not so much that findClass is failing, as that something already failed and didn't get properly recovered: the error message An attempt was made to call a PostgreSQL backend function after an elog(ERROR) had been issued is the giveaway.
The sqlj storage for classes is convenient, but it is kind of at the mercy of PostgreSQL when there has been an error during a transaction. PostgreSQL is simply not going to allow any more queries via SPI, even innocent read-only ones like PL/Java trying to load a class, from the moment of that error to the moment the transaction is fully rolled back (unless there was a savepoint; a PL/Java savepoint can be rolled back and counts as recovering from an error, so queries can be made again).
It can get ugly if the code involved in the error recovery and savepoint rollback happens to need a class loaded.
So it seems that a first mystery to investigate would be: what was the PostgreSQL error that had already been ereported, causing the SPI-locked-out state that was then the cause of the exception you later got.
If that earlier error was caught in a Java catch block and has not yet had a chance to bubble out and be logged, connecting a debugger might be the best way to catch it. Are you compiling the Java code with -g?
Another thing worth remembering is that NoClassDefFoundError is not the normal thing thrown when a class you're looking for isn't found by the loader. If you're looking for Foo and the loader can't find it, you get ClassNotFoundException for Foo.
NoClassDefFoundErroris thrown during linkage, if a class got successfully loaded but some other class mentioned in its constant pool isn't getting resolved. (In this case, that might only be because of some other error that locked out queries.)
I do realize that this is a complex problem and I potentially overloaded you on my initial statement, but we do also get ClassNotFoundException for the same static class Helpers. It seems to vary which one we get.
I tried "flattening" the Helpers class directly into the Validation class as member functions and then the error moves to reporting it cannot find GSON classes.
I have also tried to call the Helpers method dynamically using Class.forName() and then clazz.processSubstitutions(...), but did not change the outcome.
I have pulled the SQL being executed to a vary small CTE to feed parameter 1 and 2 to the Publish.validate method, such that I wholly control the execution. I run it from DBeaver and get ClassNotFoundException, but as soon as I add my jars to the vmoptions through -Djava.class.path= then it all works.
I feel like I am missing something obvious or simple that i've overlooked or blinded myself to, but what?
PL/Java's classloader is failing at that point because an earlier PostgreSQL error has not been fully handled, leaving PostgreSQL internals in an unusable state until a rollback of the transaction or of a savepoint, which has not yet occurred. There is nothing PL/Java's classloader can or could do to succeed in that situation.
I have had to solve exactly similar issues in the past. The key is always to find out what earlier PostgreSQL error got raised and has not been handled yet. (I'm not saying that preceding error can't also turn out to be PL/Java's fault. Sometimes it does. But the key is always to first find out what that error was. Once that is known, the picture often comes together quickly.)
If that earlier error was caught in a Java catch block and has not yet had a chance to bubble out and be logged, connecting a debugger might be the best way to catch it. Are you compiling the Java code with -g?
If you are running 1.6.8, good places to set a debugger breakpoint will be:
With a C debugger such as gdb, use a breakpoint on Exception.c line 174, and use where or bt to get a stack trace when the breakpoint is reached.
With a Java debugger such as jdb, use a breakpoint on org.postgresql.pljava.internal.ServerException line 30. When the breakpoint is reached, you can use where to get a stack trace, and dump errorData to see what is known about the error.
More information on how to attach a Java debugger to a running PostgreSQL backend is here.
Once we have that information on whatever earlier error is getting in the way, it will be a lot clearer where to go next.
Hi,
Do you think you will have time to get any stack traces for me to look at?
Hi, not ignoring you, been really busy with fires at work and I am trying to get permissions for a replica postgres server with my client schema and rights. Corporate red tape is making this much harder for me. We are not allowed many rights on the DB hosts and I'm not entirely sure I will be able to even attach a debugger after all this. I am trying though, but it's a slog.
In the meantime, do you have any further suggestions that I might do aside from attaching? I was looking at potentially using logback to see what i can get out of it.
I understand.
Do you think the problem could be replicable in a PostgreSQL instance with your software but different data, or fake data, or not all your data?
I routinely use a test harness (like this one that PL/Java includes, which is based on this one that PostgreSQL includes) to just spin up PostgreSQL as me on some random box without making a permanent installation or using privileged ports or any of that.
Could that be bureaucratically simpler?
I do see here that if you set the logging level to DEBUG2 you should at least get a message with the name of the function where the earlier error happens. Not as good as a stack trace (and maybe even completely unenlightening without a stack trace) but at least it's something to do.
Any news?
Really sorry for my long delay. We have been dealing with overlapped releases and it was an excessive amount of work. I am tired lol. I did manage to find the problem that was going on and hoping that you can do something in the future to help customers better diagnose problems like this.
So, the query being executed is defined by a string in one of my constants.java classes. This string has a substitution that is made at runtime, e.g. select key, value from properties. It was working for sometime until a developer decided to change the properties table to have key_name, value instead, so the SQL was actually excepting that it couldn't find the column "key", but the error we got was not related, nor helpful enough to identify the problem.
Is there any enhancements you can do to report (on SQL exception) what SQL statement was run and what error was returned? Perhaps you already are doing this and given my inability to use admin mode I just cannot see this detail. Perhaps you could log statements you execute (in debug mode) to a SQLJ table? Idk, just spitballing.
Anyway, lemme know if you have further questions.
I'm not sure I've seen enough detail yet to complete the picture, but this explanation changes my understanding somewhat. It had sounded like something that went wrong only on the tenth step after working for nine earlier steps, but clearly a query with the wrong column name is going to raise an exception the first and every time. So the tenth validation check in your application must be the first one that uses this particular query.
I'm sure the exception that is raised by the query is as detailed and informative as you would want, but what is happening is that during the handling of that exception a class needs to be loaded, and the class loader makes its own queries, which is not allowed when a PostgreSQL error has already been raised and not yet recovered from. So you were getting the exception from the class loader, in place of the original exception.
There are a lot of ways the current exception handling and logging can be improved (see Thoughts on logging), and there's certainly a need to handle this kind of situation better. I would prefer to see the earlier, incompletely-handled exception continue to propagate (perhaps with the later, consequent exception added to its suppressed list). There is the complication, of course, that the earlier exception might not be of an expected type where the later exception is raised.
It might be interesting to see some of the code around these locations:
at schema:dimbuild_sprvl//com.iri.udb.pljava.publish.Validation.checkAttrSpecMeta(Validation.java:211)
at schema:dimbuild_sprvl//com.iri.udb.pljava.publish.Validation.run(Validation.java:159)
at schema:dimbuild_sprvl//com.iri.udb.pljava.Publish.validate(Publish.java:75)
at schema:dimbuild_sprvl//com.iri.udb.pljava.Publish.validate(Publish.java:48)
just to have a better idea what is happening just before the later exception is raised.
It would also be useful to see the code that executes the query with the wrong column name. Where is the catch block that catches that exception, and what is done with it? You have to either rethrow the same exception, or throw a different one, or roll back a previously-established savepoint to put PostgreSQL in a usable condition again. You can't just catch a PostgreSQL exception and, say, log it and exit the catch block normally; that leaves PostgreSQL unusable until a rollback.
When I described the steps I was unaware that the step that was failing called some SQL. The intent of the extension code was to simply do validation on an input XML string. Another developed added code that needed more complex validation. I realize now I did not mention that, my apologies.
Not sure I'm clear on what you mean by "expected type"? I think it would be helpful to have the origin exception be part of the Postgres exception stack. Atleast there would be an indicator as to what the origin of the process failure was.
I cannot look at the code at the moment, VPN is down, but my exception handling code looks for several different source exceptions, handles them and "translates" them into a custom exception class that is transformed into JSON and returned. The entry point call Publish.validate does that transform into JSON (if memory serves) then the calling SQL function will log the results and whatever else it does with it.
I know that does not entirely answer your questions, heck might even raise more questions than answers, If the VPN comes back up anytime soon I will look at the code and snippet it up and get permission to share.
If you do not hear from me again today and you are in the US then have a wonderful holiday weekend and I get back to you on Tuesday.
Not sure I'm clear on what you mean by "expected type"?
Java has a notion of checked exceptions, which are not supposed to be raised except by code that declares them or catches them.
What may have happened here is that an exception for a nonexistent column name was caught in Java but not properly handled, leaving the PostgreSQL transaction in an unusable state. A later attempt at classloading then incurred its own exception. I am tempted to just re-raise the earlier exception at that point, but one complication is that the earlier exception may not be one that is declared or caught at the site where the later one is raised,
It's not insoluble—there are a lot of places PL/Java just wraps a checked exception in an unchecked one—but that solution brings its own problems.
my exception handling code looks for several different source exceptions, handles them and "translates" them into a custom exception class that is transformed into JSON and returned.
Does it also roll back to a previously-established savepoint? If not, it leaves PostgreSQL in an unusable state until transaction rollback.
I will have to check, when I have VPN access, if the code does a rollback, I suspect it does not.
My understanding of checked exceptions is that you must handle the exception or declare throws of the exception on the function or else you will get a compile error. I have a catch all Exception handler at the entry point Publish.validate call so I shouldn't be leaking exceptions, but perhaps it is. All checked exceptions are handled at the function of origin and then converted into a custom exception for exit processing (at the entry point function). It's possible there is something I've missed ofc.
I don't remember if you have your own PL/Java custom exception type or not, but if not perhaps it would be a good option and then throw that up to the caller, if nothing else it will allow the caller to decide how to proceed?
PL/Java generally reports database errors as subclasses of java.sql.SQLException, as it is required to by the JDBC API. The trouble is, that is a checked exception type, so you are never supposed to be able to see one being thrown from code that does not declare it,
If you have caught one and done something with it where you then return normally, but you didn't roll back the transaction state, then the database remains unusable and the next attempt to use it from anywhere (the class loader in this case) will incur another exception. As you saw, the second exception is not very helpful and it would be better to have the original, incompletely-handled, one rethrown.
It may, however, have to be wrapped in an unchecked exception, because the place in the code where it might have to be rethrown could be a place that never declared SQLException as a possible exception to be thrown there. It could be wrapped as the cause of the new exception (ClassNotFoundException in this case), but that's not quite semantically right either, because the reason the exception was originally thrown had nothing to do with loading a class.
I get what your saying. Since you need to return SQLException then perhaps a custom error code that can be checked by the caller's exception handler to know if it originated while attempting to perform a PL/Java call or an extension call VS Postgres returning one of it's error codes? IDK if that is good to do or not, but we've used custom error codes like this before, albeit our code is not a generalized extension to a major DB either.
Does adding a auto rollback setting help? Allowing the caller to set it via an API. Kind of like a fallback. This might be contentious and situational though.
In general, I would like to ask you to add API level calls to override the postgres.conf file settings. Perhaps, not all of them, as some are needed at JVM creation time, but the ones that aren't. The reason is that there have been times (more often than not lately) where the DBAs take forever to "get to our request" and we just wanna set the damn test environment up and run or need to tweak a setting in QA or User Acceptance. Moreover, there have been times where the DBAs did not follow directions (in email) and use the settings we provided. I would like to have a double check on setup of sorts.
All of the configuration settings that can be overridden outside of the conf file can already be set by existing PostgreSQL API: see the SET and SHOW commands and the set_config and current_setting functions. Also, the ones that are declared to be configurable in a session can be passed as options with a client connection, and the ones declared to be configurable at server startup can be overridden on the postgres or pg_ctl command line when starting the server.
There are certain settings that are declared as only configurable in the file and with a restart, and PL/Java has no way around that.
Of the settings that pertain to PL/Java itself, several only take effect at JVM startup and would be pointless to change afterward (and will tell you so if you try to change them afterward). The ones that can be changed afterward do not need anything beyond the existing PostgreSQL API to change them.
I don't know how elaborate your test environment is to set up, but for some kinds of testing you might take a look at PL/Java's test harness, which lets you write jshell scripts that start, stop, and connect to a server, and supports supplying your desired settings as arguments to the start method and as arguments to connect.
I have added a pull request #535 that should appear when PL/Java 1.6.10 comes out and simplify tracking down this kind of mishandled-exception bug. I don't know if you have time or resources to test the pull-request branch with your situation, but that would be welcome if you do.
It will also add the following documentation.
Catching PostgreSQL exceptions in Java
When your Java code calls into PostgreSQL to do database operations, a PostgreSQL error may result. It gets converted into a special subclass of SQLException that (internally to PL/Java) retains all the elements of the PostgreSQL error report. If your Java code does not catch this exception and it propagates all the way out of your function, it gets turned back into the original error report and is handled by PostgreSQL in the usual way.
Your Java code can also catch this exception in any catch block that covers SQLException. After catching one, there are two legitimate things your Java code can do with it:
-
Perform some cleanup as needed and rethrow it, or construct some other, more-descriptive or higher-level exception and throw that, so that the exception continues to propagate and your code returns exceptionally to PostgreSQL.
-
Roll back to a previously-established
Savepoint, perform any other recovery actions needed, and continue processing, without throwing or rethrowing anything.
If your code catches a PostgreSQL exception, and continues without rethrowing it or throwing a new one, and also without rolling back to a prior Savepoint, that is a bug. Without rolling back, the current PostgreSQL transaction is spoiled and any later calls your Java function tries to make into PostgreSQL will throw their own exceptions because of that. Historically, such bugs have been challenging to track down, as you may end up only seeing a later exception having nothing at all to do with the one that was originally mishandled, which you never see.
Tips for debugging mishandled exceptions
Some features arriving in PL/Java 1.6.10 simplify debugging code that catches but mishandles exceptions.
More-informative in-failed-transaction exception
First, the exception that results when a call into PostgreSQL fails because of an earlier mishandled exception has been made more informative. It has an SQLState of 25P02 (PostgreSQL's "in failed SQL transaction" code), and its getCause method actually returns the unrelated earlier exception that was mishandled (and so, in that sense, really is the original 'cause'). Java code that catches this exception can use getStackTrace to examine its stack trace, or call getCause and examine the stack trace of the earlier exception. The stack trace of the failed-transaction exception shows the context of the later call that failed because of the earlier mishandling, and the stack trace of the 'cause' shows the context of the original mishandled problem.
Note, however, that while your code may mishandle an exception, the next call into PostgreSQL that is going to fail as a result might not be made from your code at all. It could, for example, happen in PL/Java's class loader and appear to your code as an unexplained ClassNotFoundException. The failed-transaction SQLException and its cause should often be retrievable from the cause chain of whatever exception you get, but could require following multiple cause links.
Additional logging
Additionally, there is logging that can assist with debugging when it isn't practical to add to your Java code or run with a debugger to catch and examine exceptions.
When your Java function returns to PostgreSQL, normally or exceptionally, PL/Java checks whether there was any PostgreSQL error raised during your function's execution but not resolved by rolling back to a savepoint.
If there was, the logging depends on whether your function is returning normally or exceptionally.
If your function has returned normally
If a PostgreSQL error was raised, and was not resolved by rolling back to a savepoint, and your function is making a normal non-exception return, then, technically, your function has mishandled that exception. The mishandling may be more benign (your function made no later attempts to call into PostgreSQL that failed because of it) or less benign (if one or more later calls did get made and failed). In either case, an exception stack trace will be logged, but the log level will differ.
Note: "More benign" still does not mean "benign". Such code may be the cause of puzzling PostgreSQL warnings about active snapshots or unclosed resources, or it may produce no visible symptoms, but it is buggy and should be found and fixed.
In the more-benign case, it is possible that your code has long been mishandling that exception without a problem being noticed, and it might not be desirable for new logging added in PL/Java 1.6.10 to create a lot of new log traffic about it. Therefore, the stack trace will be logged at DEBUG1 level. You can use SET log_min_messages TO DEBUG1 to see any such stack traces.
In the less-benign case, the mishandling is likely to be causing some problem, and the stack trace will be logged at WARNING level, and so will appear in the log unless you have configured warnings not to be logged. The first in-failed-transaction exception is the one whose stack trace will be logged, and that stack trace will include Caused by: and the original mishandled exception with its own stack trace.
If your function has returned exceptionally
If a PostgreSQL error was raised and your function is returning exceptionally, then there may have been no mishandling at all. The exception emerging from your function may be the original PostgreSQL exception, or a higher-level one your code constructed around it. That would be normal, non-buggy behavior.
It is also possible, though, that your code could have caught a PostgreSQL exception, mishandled it, and later returned exceptionally on account of some other, even unrelated, exception. PL/Java has no way to tell the difference, so it will log the PostgreSQL exception in this case too, but only at DEBUG2 level.
PL/Java's already existing pre-1.6.10 practice is to log an exception stack trace at DEBUG1 level any time your function returns exceptionally. Simply by setting log_level to DEBUG1, then, you can see the stack trace of whatever exception caused the exceptional return of your function. If that exception was a direct result of the original PostgreSQL exception or of a later in-failed-transaction exception, then the cause chain in its stack trace should have all the information you need.
If, on the other hand, the exception causing your function's exceptional return is unrelated and its cause chain does not include that information, then by bumping the log level to DEBUG2 you can ensure the mishandled exception's stack trace also is logged.
Example
PL/Java's supplied examples include a MishandledExceptions class creating a mishandle function that can be used to demonstrate the effects of mishandling and what is visble at different logging levels.
The new exception-handling and logging features documented in the comment above have landed in 1.6.10.