Vertica-Extension-Packages
Vertica-Extension-Packages copied to clipboard
compatlib_function - ConnectBy supports VARCHAR inputs
Currently, ConnectBy supports INTEGER inputs as parent ID and child ID. Now, it supports VARCHAR inputs as well. Close #69.
[Test Data]
=> CREATE TABLE company (id INT, supervisor_id INT, name VARCHAR(20));
=> INSERT INTO company VALUES (1, null, 'Patrick');
=> INSERT INTO company VALUES (2, 1, 'Jim');
=> INSERT INTO company VALUES (3, 1, 'Sandy');
=> INSERT INTO company VALUES (4, 3, 'Brian');
=> INSERT INTO company VALUES (4, 3, 'Otto');
=> COMMIT;
[Test Result with INTEGER inputs]
=> SELECT connect_by_path(supervisor_id, id, name, ' >> ') OVER () FROM company;
identifier | depth | path
------------+-------+--------------------------
1 | 0 | Patrick
2 | 1 | Patrick >> Jim
3 | 1 | Patrick >> Sandy
4 | 2 | Patrick >> Sandy >> Otto
(4 rows)
[Test Result with VARCHAR inputs]
=> SELECT connect_by_path(supervisor_id::VARCHAR, id::VARCHAR, name, ' >> ') OVER () FROM company;
identifier | depth | path
------------+-------+--------------------------
1 | 0 | Patrick
2 | 1 | Patrick >> Jim
3 | 1 | Patrick >> Sandy
4 | 2 | Patrick >> Sandy >> Otto
(4 rows)
Thank you!
I have one reservation.
You give two factories: ConnectByIntFactory and ConnectByVarcharFactory to provide two interfaces to the same function (by reading ints as strings). This means that someone using ints will pay a conversion price to strings. In addition, all the map operations will be done on string keys and not int keys, which I suspect is less efficient.
If one's tables are small, this is probably not such a big concern, but Vertica is used with tables that have billions of rows.
Given that many (possibly most) uses of this function will be using integer keys from a dimension table, I think it might be better to have separate ConnectByInt and ConnectByVarchar implementations which only do conversions when needed.
I wonder if you could implement a template that does the algorithmic work in processPartition to avoid duplication?
Does that seem reasonable to you?
@dmankins Thank you for your valuable feedback! Yes, the data conversion from int to string will be a performance concern. I removed the conversion using the template. Can you review this request again?