Line/Column numbers
Does postgres return line/column numbers in its parse tree? If it does, is it possible to put them in the JSON output?
I would like to use node-pg-query-native to split a complex schema creation file into its individual commands, but can't just use query.split(/;/g) because it contains plpgsql functions. Since pg-query-parser can't deparse some of the types that I'm using (e.g., DROP statements). If the libpg_query output included line number/column number, I could use this parser to determine when commands start and end, then use that information to split out the original queries from the original sql file.
Hi @bryanburgers - thanks for reaching out!
The answer is, it depends a bit on which part of the tree you are looking at. In most node types you have a location field, which corresponds to the character index in the query string. Note that newlines don't have any special meaning, so there is no concept of lines in the parsetree.
Unfortunately this is not always present on the top level node, e.g. CreateStmt will only have the location in the RangeVar and other nodes inside it, but there is no location field on the top-level node.
I've been thinking of patching Postgres inside libpg_query to resolve this, but it'll be a bit of effort to keep supporting this with every new major release.
This doesn't really help your case I think, but I'd be open to pull requests at the very least, if you want to invest the time to write a patch. There is one patch that adds a location field to DefElem (something that actually also got done upstream in Postgres 10), which you could use as a starting point: https://github.com/lfittl/libpg_query/blob/9.5-latest/patches/02_normalize_alter_role_password.patch
Let me know if you want to go down the route of working on a PR for this, I'd be happy to support you & help getting you started.
There has been effort in upstream PostgreSQL to add query location and length on all top-level parse nodes. So far it hasn't been merged into Postgres 10, but it might provide a way to resolve this issue.
The latest patch on the mailinglist: https://www.postgresql.org/message-id/alpine.DEB.2.20.1612301453280.32017%40lancre
(this won't apply cleanly to libpg_query, since we're still on Postgres 9.5, but can be a good starting point - or we just wait for Postgres 10 to be released, assuming the patch gets merged)
@bryanburgers I've just completed rebasing this library to the Postgres 10 parser (beta1, still subject to change)
The branch is available here: https://github.com/lfittl/libpg_query/tree/10-latest
This introduces RawStmt nodes that are on top of the tree for each statement, which contain stmt_location and stmt_len variables that indicate the position and length of a statement in the original query text. I believe you should be able to use this to split a schema file successfully.
Does that sound like it'd work for you?