libpqxx
libpqxx copied to clipboard
as_array() on varchar-based array_agg() column - and semicolon separator
We've experienced some issue when using as_array()
on a varchar-based array_agg()
column:
auto values = psql_array_to_vector(row["tag_v"].as_array());
(psql_array_to_vector is a simple wrapper method calling the libpqxx provided array parser)
std::vector<std::string> psql_array_to_vector(pqxx::array_parser&& parser) {
std::vector<std::string> result;
auto obj = parser.get_next();
while (obj.first != pqxx::array_parser::done)
{
if (obj.first == pqxx::array_parser::string_value) {
result.push_back(obj.second);
}
obj = parser.get_next();
}
return result;
}
Source:
- https://github.com/zerebubuth/openstreetmap-cgimap/blob/89e84c1e89479a64f7736f7d5cef28a03278d273/src/backend/apidb/common_pgsql_selection.cpp#L78-L79
- https://github.com/zerebubuth/openstreetmap-cgimap/blob/89e84c1e89479a64f7736f7d5cef28a03278d273/src/data_selection.cpp#L9-L21
For our payload {use_sidepath,secondary,3,1,yes,50,"Rijksweg Noord",asphalt,left|through;right}
, we expected the libpqxx array parser to extract 9 values, rather than 10.
10 values:
use_sidepath
secondary
3
1
yes
50
Rijksweg Noord
asphalt
left|through
right
As shown in the payload above, the last element in the array ( asphalt,left|through;right
) is both unquoted, and contains a semicolon character.
libpqxx seems to treat this semicolon as a separator character, although the underlying datatype is a varchar. We didn't find any way to influence which characters should be treated as a separator character. As Postgresql wouldn't escape a string containing semicolons, it would be good not to consider semicolon as a separator here.
https://github.com/zerebubuth/openstreetmap-cgimap/issues/276#issuecomment-1196380394 has a bit more discussion
Relevant query
SELECT w.id, w.visible,
to_char(w.timestamp,'YYYY-MM-DD\T\HH24:MI:SS\Z\') AS timestamp,
w.changeset_id, w.version, t.keys as tag_k, t.values as tag_v,
wn.node_ids as node_ids
FROM current_ways w
LEFT JOIN LATERAL
(SELECT array_agg(k) as keys, array_agg(v) as values
FROM current_way_tags WHERE w.id=way_id) t ON true
LEFT JOIN LATERAL
(SELECT array_agg(node_id) as node_ids
FROM
(SELECT node_id FROM current_way_nodes WHERE w.id=way_id
ORDER BY sequence_id) x) wn ON true
WHERE w.id = 4000392205
ORDER BY w.id
(based on https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/test/structure.sql)
Downstream issue: https://github.com/zerebubuth/openstreetmap-cgimap/issues/276
Thanks for looking in this!
@tomhughes, @pnorman: please chime in, in case I forgot something.
I think that basically covers it - as the array parser has no idea of the type of the value it's operating on and can't access the type info to get the separator I think the separator needs to be an argument to the parser so it can be configured.
If it defaulted to comma then existing code would still work in almost all cases and the odd cases where semicolon was needed would need to change.
That would also allow custom types that use other separators to use it by configuring the separator.
You're right, that's always been a problem waiting to happen... I think at the time I thought that the backend would quote such strings. I also just discovered another thing I thought about array quoting was wrong: I thought there was such a thing as single-quoted array elements. See #587.
So we do need to pass a separator, which can default to a comma. But it gets more difficult when an array combines different types, with different separators. I think the only way that can happen though is in a multi-dimensional array where the ultimate elements use semicolon as the separator. Do you agree?
If so, we may want a separate provision for multi-dimensional arrays. TBH array handling in libpqxx is still in its infancy. For the existing as_array()
I think it would make sense to pass the separator as a template argument. That fits best with the direction I'm taking with text encodings.
There has been no activity on this ticket. Consider closing it.
I don't think this should be closed, as it's still a bug that prevents using text[]
arrays where there is no constraint excluding semicolons.
I think the only way that can happen though is in a multi-dimensional array where the ultimate elements use semicolon as the separator.
Checking pg_type, the only type that use something other than ,
as a delimiter is the box
type. Can you give an example of another situation where a ;
would be the delimiter?
postgres=# SELECT typname, typdelim FROM pg_type WHERE typdelim <> ',';
typname | typdelim
---------+----------
box | ;
_box | ;
Of course, other types could added with different delimiters. Notably, PostGIS has types with :
as a separator.
It's always possible for a user to create custom types with other delimiters, but this is unlikely.
CREATE TYPE silly;
create function sillyin(cstring) returns silly language internal IMMUTABLE PARALLEL SAFE STRICT as $function$textin$function$;
create function sillyout(silly) returns cstring language internal IMMUTABLE PARALLEL SAFE STRICT AS $function$textout$function$;
CREATE TYPE silly (INPUT = sillyin, OUTPUT = sillyout, LIKE = text, DELIMITER = 'a');
select ARRAY['b','c']::silly[];
gives {bac}
as the text representation for an array with two elements.
Absolutely, this ticket should stay open until fixed. Github will take the ongoing conversation as a hint.
As I mentioned I'm thinking to make the separator character a template argument, which means that you'll need to know this character at compile time. Does anyone have a problem with that? (I'll have to move some code around, but an important part of the encodings puzzle for this just came together.)
Also, would I be correct in assuming that an array can have at most two different separator characters: the element type's separator, and in the multidimensional case, the comma between sub-arrays?
On Fri, Sep 30, 2022, 07:34 Paul Norman @.***> wrote:
I don't think this should be closed, as it's still a bug that prevents using text[] arrays where there is no constraint excluding semicolons.
I think the only way that can happen though is in a multi-dimensional array where the ultimate elements use semicolon as the separator.
Checking pg_type, the only type that use something other than , as a delimiter is the box type. Can you give an example of another situation where a ; would be the delimiter?
postgres=# SELECT typname, typdelim FROM pg_type WHERE typdelim <> ','; typname | typdelim ---------+---------- box | ; _box | ;
Of course, other types could added with different delimiters. Notably, PostGIS has types with : as a separator.
It's always possible for a user to create custom types with other delimiters, but this is unlikely.
CREATE TYPE silly;create function sillyin(cstring) returns silly language internal IMMUTABLE PARALLEL SAFE STRICT as $function$textin$function$;create function sillyout(silly) returns cstring language internal IMMUTABLE PARALLEL SAFE STRICT AS $function$textout$function$;CREATE TYPE silly (INPUT = sillyin, OUTPUT = sillyout, LIKE = text, DELIMITER = 'a');select ARRAY['b','c']::silly[];
gives {bac} as the text representation for an array with two elements.
— Reply to this email directly, view it on GitHub https://github.com/jtv/libpqxx/issues/590#issuecomment-1263122157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQYDEY3MYZCNKD4WWOCJFDWAZ3UPANCNFSM55EXEJUA . You are receiving this because you commented.Message ID: @.***>
The delimiter character is always the one defined before the type, including multidimensional arrays. e.g. SELECT ARRAY[['meeting', 'lunch'], ['training', 'presentation']]::silly[][]
gives {{meetingalunch}a{"training"a"presentation"}}
. As far as I can tell, an array can only ever have exactly one delimiter.
Making it a template argument prevents one from querying for the separator character from the DB, but this is probably fine.
Github bot, what are you doing!? I'm still working on this one!
Maybe try removing the no-issue-activity label?
Ah, I hadn't noticed the label, thanks. The button to remove it showed up below the conversation (where there's a second "Labels" section. Not great UI work.)
I've come to the conclusion that I can't fix this properly in array_parser
without (1) breaking compatibility and (2) complicating an already awkward API that little bit more.
So I'm changing array_parser
so that it will only work with comma separators, and designing a new API for parsing arrays.
There has been no activity on this ticket. Consider closing it.
Please remove the no activity label so it doesn't autoclose
Thanks for the reminder. For the record, there has been activity on the ticket but it's been moving slowly - #609 is a new, very different array parser that converts an SQL array into something like a C++ container.
Can this be re-opened as its not yet fixed?
Yes, and I'll merge the fix right away. Which I think will automatically close it again. :-)
The fix, by the way, is a new pqxx::array
class. You construct it with a string representing an SQL array, and a connection (just so it knows which encoding to use), and it holds the values, converted to a type of your choice.
I still need to sort out iteration though. It's got indexing, but iteration gets a little complicated for multi-dimensional arrays.