manticoresearch
manticoresearch copied to clipboard
INNER/LEFT JOIN
The task is to implement INNER/LEFT JOIN functionality with the following limitations:
- 2 tables only
- special column instead of full NULL support, e.g.:
select * from purchases AS p left join articles AS a ON a.id = p.article_id: +------+------------+-------------+------+-------+-------------+ | id | article_id | customer_id | id | title | @right_null | +------+------------+-------------+------+-------+-------------+ | 1 | 1 | 10 | 1 | book | 0 | | 2 | 1 | 11 | 1 | book | 0 | | 3 | 3 | 10 | 0 | | 1 | +------+------------+-------------+------+-------+-------------+
Little issue with NULLs in 6.2.13 255ad5232@24030516
No NULL for the last column here:
MySQL [(none)]> DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b; CREATE TABLE a (f text); CREATE TABLE b ( a_id int, i int); INSERT INTO a VALUES (1, 'a'); select * from a left join b on b.a_id=a.id;
--------------
DROP TABLE IF EXISTS a
--------------
Query OK, 0 rows affected (0.026 sec)
--------------
DROP TABLE IF EXISTS b
--------------
Query OK, 0 rows affected (0.004 sec)
--------------
CREATE TABLE a (f text)
--------------
Query OK, 0 rows affected (0.001 sec)
--------------
CREATE TABLE b ( a_id int, i int)
--------------
Query OK, 0 rows affected (0.001 sec)
--------------
INSERT INTO a VALUES (1, 'a')
--------------
Query OK, 1 row affected (0.002 sec)
--------------
select * from a left join b on b.a_id=a.id
--------------
+------+------+------+--------+------+
| id | f | b.id | b.a_id | b.i |
+------+------+------+--------+------+
| 1 | NULL | NULL | NULL | 0 |
+------+------+------+--------+------+
1 row in set (0.000 sec)
If I replace the text field with another int it becomes ok:
MySQL [(none)]> DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b; CREATE TABLE a (i int); CREATE TABLE b ( a_id int, i int); INSERT INTO a VALUES (1, 123); select * from a left join b on b.a_id=a.id;
--------------
DROP TABLE IF EXISTS a
--------------
Query OK, 0 rows affected (0.026 sec)
--------------
DROP TABLE IF EXISTS b
--------------
Query OK, 0 rows affected (0.000 sec)
--------------
CREATE TABLE a (i int)
--------------
Query OK, 0 rows affected (0.001 sec)
--------------
CREATE TABLE b ( a_id int, i int)
--------------
Query OK, 0 rows affected (0.001 sec)
--------------
INSERT INTO a VALUES (1, 123)
--------------
Query OK, 1 row affected (0.000 sec)
--------------
select * from a left join b on b.a_id=a.id
--------------
+------+------+------+--------+------+
| id | i | b.id | b.a_id | b.i |
+------+------+------+--------+------+
| 1 | 123 | NULL | NULL | NULL |
+------+------+------+--------+------+
1 row in set (0.000 sec)
The task is to implement INNER/LEFT JOIN functionality with the following limitations: 2 tables only special column instead of full NULL support, e.g.:
Done in https://github.com/manticoresoftware/manticoresearch/pull/1855 even with proper NULL support for LEFT JOIN
.
What's left is to document it along with the following nuances:
- [ ] You cannot perform
SELECT left_table.field_name, right_table.field_name FROM ...
. Instead, you must useSELECT field_name, right_table.field_name FROM ...
. - [ ] You must use
JOIN ON table_name.some_field = another_table_name.some_field
; you cannot omit the table names. - [ ] You cannot do
select *, (nums2.n + 3) * n from nums left join nums2 on nums2.id = nums.num2_id
(i.e., when the operands are from both tables) without aliasing. Useselect *, (nums2.n + 3) x, x * n from nums left join nums2 on nums2.id = nums.num2_id
instead. Rel. issue https://github.com/manticoresoftware/manticoresearch/issues/1918 - [x]
SELECT right_table.stored_field
is not supported. Same withselect * ... join ...
, i.e. you won't get a stored field from the joined table in the result set. Rel. issue https://github.com/manticoresoftware/manticoresearch/issues/1915 - [ ] When you do
expr(field_from_left_table, field_from_right_table) AS some_alias
, you can get the value of the expression, but you can't filter by it even if you alias it. Rel. issue https://github.com/manticoresoftware/manticoresearch/issues/1919 - [ ] ANY(right.multi) requires an expression with a query example (rel. slack thread https://manticore-community.slack.com/archives/C7NSLK1NE/p1710674370442859?thread_ts=1710630191.329119&cid=C7NSLK1NE)
- [ ] Grouping by joined JSON attributes that generate multiple groupby values (JSON arrays) is not supported
Little issue with NULLs in
6.2.13 255ad5232@24030516
No NULL for the last column here:
Fixed in 25d99ef
A new error has been detected. Sorting in order by id asc
does not work if engine='columnar' is used
mysql> drop table if exists tbl1; CREATE TABLE tbl1 engine='columnar'; INSERT INTO tbl1 VALUES (1); drop table if exists tbl2; CREATE TABLE tbl2 (tbl1_id bigint); INSERT INTO tbl2 VALUES (1, 1); select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc;
--------------
drop table if exists tbl1
--------------
--------------
CREATE TABLE tbl1 engine='columnar'
--------------
--------------
INSERT INTO tbl1 VALUES (1)
--------------
--------------
drop table if exists tbl2
--------------
--------------
CREATE TABLE tbl2 (tbl1_id bigint)
--------------
--------------
INSERT INTO tbl2 VALUES (1, 1)
--------------
--------------
select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc
Sorting in order by id
asc works if engine='columnar' is not used
drop table if exists tbl1; CREATE TABLE tbl1; INSERT INTO tbl1 VALUES (1); drop table if exists tbl2; CREATE TABLE tbl2 (tbl1_id bigint); INSERT INTO tbl2 VALUES (1, 1); select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc;
--------------
drop table if exists tbl1
--------------
--------------
CREATE TABLE tbl1
--------------
--------------
INSERT INTO tbl1 VALUES (1)
--------------
--------------
drop table if exists tbl2
--------------
--------------
CREATE TABLE tbl2 (tbl1_id bigint)
--------------
--------------
INSERT INTO tbl2 VALUES (1, 1)
--------------
--------------
select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc
--------------
+------+
| id |
+------+
| 1 |
+------+
A new error has been detected. Sorting in order by id asc does not work if engine='columnar' is used
Fixed in de6dac0
I confirm that sorting by id asc works, regardless of whether engine='columnar' is used.
But I noticed a new bug, on the minimal table group by t2.attr
and both attr
rows are identical, and probably the hash should be identical.
mysql> DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 (id bigint, name text, surname text); INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2'); select * from t1; CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json); INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}'); select * from t2; select id, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr order by id asc;
--------------
DROP TABLE IF EXISTS t1
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
DROP TABLE IF EXISTS t2
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
CREATE TABLE t1 (id bigint, name text, surname text)
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2')
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
select * from t1
--------------
+------+-------+----------+
| id | name | surname |
+------+-------+----------+
| 1 | name1 | surname1 |
| 2 | name2 | surname2 |
+------+-------+----------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---
--------------
CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json)
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}')
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
select * from t2
--------------
+---------------------+-------+--------------+-------+-----------------------------+
| id | name | description | t2_id | attr |
+---------------------+-------+--------------+-------+-----------------------------+
| 5839239912211561623 | name1 | description1 | 1 | {"color":"black","size":14} |
| 5839239912211561624 | name2 | description2 | 1 | {"color":"black","size":14} |
+---------------------+-------+--------------+-------+-----------------------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---
--------------
select id, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr order by id asc
--------------
+------+----------+-----------------+
| id | count(*) | groupby() |
+------+----------+-----------------+
| 1 | 1 | 140586561985152 |
| 1 | 1 | 140586560571744 |
+------+----------+-----------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---
@PavelShilin89
it gives random data, which is not in the table
What exactly random data do you mean? The groupby()
values? What result do you expect? Do you mean the problem is that groupby()
is not the same for the records?
@glookka Crash in:
show version
--------------
+-----------+--------------------------------+
| Component | Version |
+-----------+--------------------------------+
| Daemon | 6.2.13 08a009d44@24031205 dev |
| Columnar | columnar 2.2.5 aa3504b@240304 |
| Secondary | secondary 2.2.5 aa3504b@240304 |
| KNN | knn 2.2.5 aa3504b@240304 |
| Buddy | buddy v2.3.1 |
+-----------+--------------------------------+
5 rows in set (0.00 sec)
based on the Pavel's MRE with a little bit different query:
mysql> DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 (id bigint, name text, surname text); INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2');
select * from t1; CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json); INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}'); select * from t2; select id, *, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr.color, t2.attr.size order by id asc;
--------------
DROP TABLE IF EXISTS t1
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
DROP TABLE IF EXISTS t2
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
CREATE TABLE t1 (id bigint, name text, surname text)
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2')
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
select * from t1
--------------
+------+-------+----------+
| id | name | surname |
+------+-------+----------+
| 1 | name1 | surname1 |
| 2 | name2 | surname2 |
+------+-------+----------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---
--------------
CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json)
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}')
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
select * from t2
--------------
+---------------------+-------+--------------+-------+-----------------------------+
| id | name | description | t2_id | attr |
+---------------------+-------+--------------+-------+-----------------------------+
| 5839241405892919297 | name1 | description1 | 1 | {"color":"black","size":14} |
| 5839241405892919298 | name2 | description2 | 1 | {"color":"black","size":14} |
+---------------------+-------+--------------+-------+-----------------------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---
--------------
select id, *, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr.color, t2.attr.size order by id asc
--------------
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104
ERROR:
Can't connect to the server
@PavelShilin89
он дает случайные данные, которых нет в таблице
Какие именно случайные данные вы имеете в виду? Ценности
groupby()
? Какого результата вы ожидаете? Вы имеете в виду, что проблема в том, чтоgroupby()
это не то же самое для записей?
I filled the table with data. However, as a result of the query select id, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr order by id asc
field groupby()
gives data that are not in the table. I can't understand where they come from.
Pls read in the docs about groupby(). It's a hash, it can't exist in the table. But the point that the hash values are different for seemingly identical rows looks wrong.
This task is done to some extent. The functionality has been released in beta stage in 6.3.0. The docs can be found here https://manual.manticoresearch.com/Searching/Joining
Hello @sanikolaev, is it possible to do a join between distributed indexes as well ? When I'm trying to do a join between tables on the local indexes (that are inside the distributed ones), it returns result as it should, but when trying both left/inner join on a distributed level, I always end up with an error saying "Unknown local table(s) {distName} in search request.
is it possible to do a join between distributed indexes as well ?
No.
Hello @sanikolaev, is it possible to do a join between distributed indexes as well ?
Similar question here https://t.me/manticore_chat/1/14326