pljava
pljava copied to clipboard
Work-in-Progress on PL/Java refactoring, API modernization
As a work-in-progress pull request, this is not expected to be imminently merged, but is here to document the objectives and progress of the ongoing work.
Why needed
A great advantage promised by a PL based on the JVM is the large ecosystem of languages other than Java that can be supported on the same infrastructure, whether through the Java Scripting (JSR 223) API, or through the polyglot facilities of GraalVM, or simply via separate compilation to the class file format and loading as jars.
However, PL/Java, with its origins in 2004 predating most of those developments, has architectural limitations that stand in the way.
JDBC
One of the limitations is the centrality of the JDBC API. To be sure, it is a standard in the Java world for access to a database, and for PL/Java to conform to ISO SQL/JRT, the JDBC API must be available. But it is not necessarily a preferred or natural database API for other JVM or GraalVM languages, and its design goal is to abstract away from the specifics of an underlying database, which ends up complicating or even preventing access to advanced PostgreSQL capabilities that could be prime drivers for running server-side code in the first place.
The problem is not that JDBC is an available API in PL/Java, but that it is the fundamental API in PL/Java, with its tentacles reaching right into the native C language portion of PL/Java's implementation. That has made alternative interface options impractical, and multiplied the maintenance burden of even simple tasks like adding support for new datatype mappings or fixing simple bugs. There are significant portions of JDBC 4 that remain unimplemented in PL/Java.
Experience building an implementation of ISO SQL/XML XMLQUERY
showed that certain requirements of the spec were simply unsatisfiable atop JDBC, either because of inherent JDBC limitations or limits in PL/Java's implementation of it. An example of each kind:
- The
INTERVAL
data type cannot be mapped as SQL/XML requires, because the onlyResultSetMetadata
methods JDBC defines for access to a type modifier areprecision
andscale
, which apply to numeric values; the API defines no standard way to learn what the modifier of anINTERVAL
says about whether months or days are present. - The
DECIMAL
type cannot be mapped as SQL/XML requires; for that case, the fault is not with JDBC (which defines theprecision
andscale
methods), but with their incomplete implementation in PL/Java.
Those cases also illustrate that mapping some PostgreSQL data types to those of another language can be complex. An arbitrary PostgreSQL INTERVAL
is representable as neither a java.time.Period
nor a java.time.Duration
alone (though a pair of the two can be used, a type that PGJDBC-NG offers). One or the other can suffice if the type modifier is known and limits the fields present. A PostgreSQL NUMERIC
value has not-a-number and signed infinity values that some candidate language-library type might not, and an internal precision that its text representation does not reveal, which might need to be preserved for a mathematically demanding task. The details of converting it to another language's similar type need to be knowable or controllable by an application.
It is a goal of this work to give PL/Java an API that does not obscure or abstract from PostgreSQL details, but makes them accessible in a natural Java idiom, and that such a "natural PostgreSQL" API should be adequate to allow building a JDBC layer in pure Java above it. (The work of building such a JDBC layer is not in the scope of this pull request.)
Parameter and return-value mapping
PL/Java uses a simple, Java-centric approach where a Java method is declared naturally, giving ordinary Java types for its parameters and return, and the mappings from these to the PostgreSQL parameter and return types are chosen by PL/Java and applied transparently (and much of that happens deep in PL/Java's C code).
While convenient, that approach isn't easily adapted to other JVM languages that may offer other selections of types. Even for Java, it stands in the way of doing certain things possible in PostgreSQL, like declaring VARIADIC "any"
functions.
In a modernized API, it needs to be possible to declare a function whose parameter represents the PostgreSQL FunctionCallInfo
, so that the parameters and their types can be examined and converted in Java. That will make it possible to write language handlers in Java, whether for other JVM languages or for the existing PL/Java calling conventions that at present are tangled in C.
Elements of new API
Identification of data types
A PostgreSQL-specific API must be able to refer unambiguously to any type known to the database, so it cannot rely on any fixed set of generic types such as JDBCType
. To interoperate with a JDBC layer, though, the identifier for types should implement JDBC's SQLType
interface.
The API should support retrieving enough metadata about the type for a JDBC layer implemented above it to be able to report complete ResultSetMetaData
information.
The new class serving this purpose is RegType
.
As RegType
implements the java.sql.SQLType
interface, an aliasing issue arises for a JDBC layer. Such a layer should accept JDBCType.VARCHAR
as an alias for RegType.VARCHAR
, for example. JDBC itself has no methods that return an SQLType
instance, so the question of whether it should return the generic JDBC type or the true RegType
does not arise. A PL/Java-specific API is needed for retrieving the type identifier in any case.
The details of which JDBC types are considered aliases of which RegType
s will naturally belong in a JDBC API layer. At the level of this underlying API, a RegType
is what identifies a PostgreSQL type.
While RegType
includes convenience final fields for a number of common types, those by no means limit the RegType
s available. There is a RegType
that can be obtained for every type known to the database, whether built in, extension-supplied, or user-defined.
Other PostgreSQL catalog objects and key abstractions
RegType
is one among the types of PostgreSQL catalog objects modeled in the org.postgresql.pljava.model
package.
Along with a number of catalog object types, the package also contains:
-
TupleDescriptor
andTupleTableSlot
, the key abstractions for fetching and storing database values.TupleTableSlot
in PostgreSQL is already a useful abstraction over a few different representations; in PL/Java it is further abstracted, and can present with the same API other collections of typed, possibly named, items, such as arrays, the arguments in a function call, etc. -
MemoryContext
andResourceOwner
, both subtypes ofLifespan
, usable to guard Java objects that have native state whose validity is bounded in time -
CharsetEncoding
Mapping PostgreSQL data types to what a PL supports
The Adapter
class
A mapping between a PostgreSQL data type and a suitable PL data type is an instance of the Adapter class, and more specifically of the reference-returning Adapter.As<T,U>
or one of the primitive-returning Adapter.AsInt<U>
, Adapter.AsFloat<U>
, and so on (one for each Java primitive type). The Java type produced is T
for the As
case, and implicit in the class name for the AsFoo
cases.
The basic method for fetching a value from a TupleTableSlot
is get(Attribute att, Adapter adp)
, and naturally is overloaded and generic so that get
with an As<T,?>
adapter returns a T
, get
with an AsInt<?>
adapter returns an int
, and so on. (But see this later comment below for a better API than this item-at-a-time stuff.) (The U
type parameter of an adapter plays a role when adapters are combined by composition, as discussed below, and is otherwise usually uninteresting to client code, which may wildcard it, as seen above.)
A manager class for adapters
Natural use of this idiom presumes there will be some adapter-manager API that allows client code to request an adapter for some PostgreSQL type by specifying a Java witness class Class<T>
or some form of super type token, and returns the adapter with the expected compile-time parameterized type.
That manager hasn't been built yet, but the requirements are straightforward and no thorny bits are foreseen. (Within the org.postgresql.pljava.internal
module itself, things are simpler; no manager is needed, and code refers directly to static final INSTANCE
fields of existing adapters.)
Extensibility
PL/Java has historically supported user-defined types implemented in Java, a special class of data types whose Java representations must implement a certain JDBC interface and import and export values through a matching JDBC API. In contrast, PL/Java's first-class PostgreSQL data type support—the mappings it supplies between PostgreSQL and ordinary Java types that don't involve the specialized JDBC user-defined type APIs—has been hardcoded in C using Java Native Interface (JNI) calls, and not straightforward to extend. That's a pain point for several situations:
- A mapping for another PostgreSQL data type (either a type newly added to PostgreSQL, or simply one that PL/Java does not yet have a mapping for) is not easily added for an application that needs it, but generally must be added in PL/Java's C/JNI internals and made available in a new PL/Java build.
- A mapping of an existing PostgreSQL data type to a new or different Java type—same story. When Java 8 introduced the
java.time
package, developers wishing to have PL/Java map PostgreSQL's date and time types to the improved Java types instead of the olderjava.sql
ones had to open issues requesting that ability and wait for a PL/Java release to include it. - Not every PostgreSQL data type has a single best PL type to be mapped to. One application using the geometric types might want them mapped to the Java types in the PGJDBC library, while another might prefer the 2D classes supplied by some Java geometry library. One application might want a PostgreSQL array mapped to a flat Java
List
, another to a multi-dimensioned Java array, another to a matrix class from a scientific computation library. The choices multiply when considering the data types not only of Java but of other JVM languages. C coding and rebuilding of PL/Java should not be needed to tailor these mappings.
Adapters implementable in pure Java
With this PR, code external to PL/Java's implementation can supply adapters, built against the service-provider API exposed in org.postgresql.pljava.adt.spi
.
Leaf adapters
A "leaf" adapter is one that directly knows the PostgreSQL datum format of its data type, and maps that to a suitable PL type. Only a leaf adapter gets access to PostgreSQL datums, which it should not leak to other code. Code that defines leaf adapters must be granted a permission in pljava.policy
.
Composing adapters
A composing, or non-leaf, adapter is one meant to be composed over another adapter. An example would be an adapter that composes over an adapter returning type T
(possibly null) to form an adapter returning Optional<T>
. With a selection of common composing adapters (there aren't any in this pull request, yet), it isn't necessary to provide leaf adapters covering all the ways application code might want data to be presented. No special permission is needed to create a composing adapter.
Java's generic types are erased to raw types for runtime, but the Java compiler saves the parameter information for runtime access through Java reflection. As adapters are composed, the Adapter
class tracks the type relationships so that, for example, an Adapter<Optional<T>,T>
composed over an Adapter<String,Void>
is known to produce Optional<String>
.
It is that information that will allow an adapter manager to satisfy a request to map a given PostgreSQL type to some PL type, by finding and composing available adapters.
Contract-based adapters
For a PostgreSQL data type that doesn't have one obvious best mapping to a PL type (perhaps because there are multiple choices with different advantages, or because there is no suitable type in the PL's base library, and any application will want the type mapped to something in a chosen third-party library), a contract-based adapter may be best. An Adapter.Contract
is a functional interface with parameters that define the semantically-important components of the PostgreSQL type, and a generic return type, so an implementation can return any desired representation for the type.
A contract-based adapter is a leaf adapter class with a constructor that accepts a Contract
, producing an adapter between the PostgreSQL type and whatever PL type the contract maps it to. The adapter encapsulates the internal details of how a PostgreSQL datum encodes the value, and the contract exposes the semantic details needed to faithfully map the type. Contracts for many existing PostgreSQL types are provided in the org.postgresql.pljava.adt
package.
ArrayAdapter
The one supplied ArrayAdapter
is contract-based. While a Contract.Array
has a single abstract method, and therefore could serve as a functional interface, in practice it is not directly implementable by a lambda; there must be a subclass or subinterface (possibly anonymous) whose type parameterization the Java compiler can record. (A lambda may then be used to instantiate that.) An instance of ArrayAdapter
is constructed by supplying an adapter for the array's element type along with an array contract targeting some kind of collection of the mapped type. As with a composing adapter, the Adapter
class substitutes the element adapter's target Java type through the type parameters of the array contract, to arrive at the actual parameterized type of the resulting array or collection.
PostgreSQL arrays can be multidimensional, and are regular (not "jagged"; all sub-arrays at a given dimension match in size). They can have null elements, which are tracked in a bitmap, offering a simple way to save some space for arrays that are sparse; there are no other, more specialized sparse-array provisions.
Array indices need not be 0- or 1-based; the base index as well as the index range can be given independently for each dimension. PostgreSQL creates 1-based arrays by default. This information is stored with the array value, not with the array type, so a column declared with an array type could conceivably have values of different cardinalities or even dimensionalities.
The adapter is contract-based because there are many ways application code could want a PostgreSQL array to be presented: as a List
or single Java array (flattening multiple dimensions, if present, to one, and disregarding the base index), as a Java array-of-arrays, as a JDBC Array
object (which does not officially contemplate more than one array dimension, but PostgreSQL's JDBC drivers have used it to represent multidimensioned arrays), as the matrix type offered by some scientific computation library, and so on.
For now, one predefined contract is supplied, AsFlatList
, and a static method, nullsIncludedCopy
, that can be used (via method reference) as one implementation of that contract.
Java array-of-arrays
While perhaps not an extremely efficient way to represent multidimensional arrays, the Java array-of-arrays approach is familiar, and benefits from a bit of dedicated support for it in Adapter
. Therefore, if you have an Adapter
a that renders a PostgreSQL type Foo
as Java type Bar
, you can use, for example, a.a2().build()
to obtain an Adapter
from the PostgreSQL array type Foo[]
to the Java type Bar[][]
, requiring the PostgreSQL array to have two dimensions, allowing each value to have different sizes along those dimensions, but disregarding the PostgreSQL array's start indices (all Java arrays start at 0).
Because PostgreSQL stores the dimension information with each value and does not enforce it for a column as a whole, it could be possible for a column of array values to include values with other numbers of dimensions, which an adapter constructed this way will reject. On the other hand, the sizes along each dimension are also allowed by PostgreSQL to vary from one value to the next, and this adapter accommodates that, as long as the number of dimensions doesn't change.
The existing contract-based ArrayAdapter
is used behind the scenes, but build()
takes care of generating the contract. Examples are provided.
Adapter maintainability
Providing pure-Java adapters that know the internal layouts of PostgreSQL data types, without relying on JNI calls and the PostgreSQL native support routines, entails a parallel-implementation maintenance responsibility roughly comparable to that of PostgreSQL client drivers that support binary send and receive. (The risk is slightly higher because the backend internal layouts are less committed than the send/receive representations. Because they are used for data on disk, though, historically they have not changed often or capriciously.)
The engineering judgment is that the resulting burden will be manageable, and the benefits in clarity and maintainability of the pure-Java implementations, compared to the brittle legacy Java+C+JNI approach, will predominate. The process of developing clear contracts for PostgreSQL types already has led to discovery of one bug (#390) that could be fixed in the legacy conversions.
For the adapters supplied in the org.postgresql.pljava.internal
module, it is possible to use ModelConstants.java
/ModelConstants.c
to ensure that key constants (offsets, flags, etc.) stay synchronized with their counterparts in the PostgreSQL C code.
Adapter
is a class in the API module, with the express intent that other adapters can be developed, and found by the adapter manager through a ServiceLoader
API, without being internal to PL/Java. Those might not have the same opportunity for build-time checking against PostgreSQL header files, and will have to rely more heavily on regression tests for key data values, much as binary-supporting client drivers must. The same can be true even for PL/Java internal adapters for a few PostgreSQL data types whose C implementations are so strongly encapsulated (numeric
comes to mind) that necessary layouts and constants do not appear in .h
files.
Known open items
In no well-defined order ....
- [ ] The to-PostgreSQL direction for
Adapter
,TupleTableSlot
, andDatum.Accessor
. These all have API and implementation for getting PostgreSQL values and presenting them in Java. Now the other direction is needed. - [ ] Provide API and implementation for a unified list-of-slots representation for a variety of list-of-tuple representations used in PostgreSQL, by:
- [x] factoring out the list-of-
TupleTableSlot
classes currently found as preliminary scaffolding inTupleTableSlot.java
- [x] providing such a representation for
SPITupleTable
... - [ ]
CatCList
... - [ ]
Tuplestore
? ... - [ ] ...?
- [x] factoring out the list-of-
- [ ] Implement some form of offset memoization so fetching attributes from a heap
TupleTableSlot
stays subquadratic - [ ] Finish the unimplemented
grants
methods ofRegRole
and the unmplemented unary one ofCatalogObject.AccessControlled
. (Needs theCatCList
support, forpg_auth_members
searches.) - [x] A
NullableDatum
flavor ofTupleTableSlot
. One of the last prerequisites to enable pure-Java language-handler implementations, to which the function arguments will appear as aTupleTableSlot
. - [ ] Complete the implementation of
isSubtype
with the rules from Java Language Specification 4.10. (At present it is a stub that only checks erased subtyping, enough to get things initially going.) - [ ] The adapter manager described above. (Requires
isSubtype
.) - [ ] Adapters for PostgreSQL types that don't have them yet (starting, perhaps, with the ones that already have contracts defined in
org.postgresql.pljava.adt
). - [ ]
TextAdapter
does not yet support the type modifiers forCHAR
andVARCHAR
. It needs a contract-based flavor that does. - [ ]
ArrayAdapter
(orContract.Array
) should supply at least one convenience method, taking adimsAndBounds
array parameter and generating an indexing function (aMethodHandle
?) that has nDims integer parameters and returns an integer flat index. Other related operations? An index enumerator, etc.? - [ ] A useful initial set of composing adapters, such as:
- [ ] one of the form
As<Optional<T>,T>
- [x] implement in an example class
- [ ] integrate into PL/Java proper
- [ ] one extending
As<T,T>
that returns null for null and values unchanged- why? because with adapter autoboxing, it can be composed over any primitive-returning adapter to enable it to handle null, by returning its boxed form
- [x] implement in an example class
- [ ] integrate into PL/Java proper
- [ ] a set composing over primitive adapters to use a specified value in the primitive's value space to represent null.
- [x] implement in an example class
- [ ] complete the set and integrate into PL/Java proper
- [ ] one of the form
- [ ] More work on
CatalogObject
invalidation.RegClass
andRegType
are already invalidated selectively; probablyRegProcedure
should be also. PostgreSQL has a limited number of callback slots, so it would be antisocial to grab them for all the supported classes: less critical ones just depend on the global switchpoint; come up with a good story for invalidating those. Also for howTupleDescriptor
should behave upon invalidation of itsRegClass
. See commit comments for 5adf2c8. - [ ] Better define and implement the
DualState
behavior ofTupleTableSlot
. - [ ] Reduce the C-centricity of
VarlenaWrapper
. Goal:DatumUtils.mapVarlena
doing more in Java, less in C.- [x] more of
VarlenaWrapper
's functionality moved toDatumImpl
- [x] client code no longer casting
Datum.Input
toVarlenaWrapper
to use it.
- [x] more of
- [ ] Adapter should have control over the park/fetch/decompress/lifespan decisions for
VarlenaWrapper
; currently the behavior is hardcoded for top-transaction lifespan, lazy detoasting, appropriate forSQLXML
, which was the firstVarlenaWrapper
client. - [ ] Add MBeans with statistics for the new caches
And then
- [ ] Choose some interesting JVM language foo and implement a simple PL/foo in pure Java, using these facilities.
- [ ] Reimplement PL/Java's own language handler the same way.