go-mysql-server icon indicating copy to clipboard operation
go-mysql-server copied to clipboard

Wrong state of SquashedTable in SHOW PROCESSLIST

Open smacker opened this issue 5 years ago • 11 comments

6639/113 doesn't look correct:

Command: query
   Time: 2373
  State: SquashedTable(ref_commits, commits, commit_files)(6639/113), blobs(0/113)
   Info: SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
FROM blobs b
NATURAL JOIN commit_files cf
NATURAL JOIN commits c
NATURAL JOIN ref_commits r
WHERE r.ref_name = 'HEAD'
AND r.history_index = 0
AND is_binary(b.blob_content) = false
AND cf.file_path NOT REGEXP '^vendor.*'
AND lang != ""

smacker avatar Dec 04 '18 12:12 smacker

Could you provide more info? How did you reproduce it? how many repositories did you have? did you modify the repositories folder after launch gitbase?

ajnavarro avatar Dec 04 '18 12:12 ajnavarro

I had 113 repositories. All public repos of src-d organization.

curl https://api.github.com/orgs/src-d/repos?per_page=100 > repos_list.json
mkdir -p repos
for repo in `cat repos_list.json | jq -r '.[] | .clone_url'`
do
    name=`echo ${repo%.git} | cut -d '/' -f 5`
    git clone "$repo" "repos/$name"
done

(and then repeat it with page=2)

Gitbase was restarted after cloning finished and I didn't modify repos anymore. The query to reproduce is visible in the output of SHOW PROCESSLIST above.

I hope it would help!

smacker avatar Dec 04 '18 12:12 smacker

which version did you use? I cannot reproduce the error with the latest 0.18.0-beta.3

ajnavarro avatar Dec 04 '18 16:12 ajnavarro

$ ./gitbase version
gitbase (v0.18.0-beta.3) - build 11-27-2018_11_31_37

Easily reproducible for me. Tried with empty everything.

Use the script above to get repositories.

Then:

mysql> select count(*) from repositories;
+----------+
| COUNT(*) |
+----------+
|      100 |
+----------+
1 row in set (0.01 sec)

mysql> SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
    -> FROM blobs b
    -> NATURAL JOIN commit_files cf
    -> NATURAL JOIN commits c
    -> NATURAL JOIN ref_commits r
    -> WHERE r.ref_name = 'HEAD'
    -> AND r.history_index = 0
    -> AND is_binary(b.blob_content) = false
    -> AND cf.file_path NOT REGEXP '^vendor.*'
    -> AND lang != ""
    -> ;

PROCESSLIST shows 299/100.

mysql> SHOW PROCESSLIST \G;
*************************** 1. row ***************************
     Id: 3
   User: root
   Host: 127.0.0.1:52412
     db: gitbase
Command: query
   Time: 25
  State: SquashedTable(ref_commits, commits, commit_files)(299/100), blobs(0/100)
   Info: SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
FROM blobs b
NATURAL JOIN commit_files cf
NATURAL JOIN commits c
NATURAL JOIN ref_commits r
WHERE r.ref_name = 'HEAD'
AND r.history_index = 0
AND is_binary(b.blob_content) = false
AND cf.file_path NOT REGEXP '^vendor.*'
AND lang != ""
*************************** 2. row ***************************
     Id: 6
   User: root
   Host: 127.0.0.1:52412
     db: gitbase
Command: query
   Time: 0
  State: running
   Info: SHOW PROCESSLIST
2 rows in set (0.00 sec)

ERROR:
No query specified

After few minutes 1399/100:

*************************** 2. row ***************************
     Id: 3
   User: root
   Host: 127.0.0.1:52412
     db: gitbase
Command: query
   Time: 125
  State: SquashedTable(ref_commits, commits, commit_files)(1399/100), blobs(0/100)
   Info: SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
FROM blobs b
NATURAL JOIN commit_files cf
NATURAL JOIN commits c
NATURAL JOIN ref_commits r
WHERE r.ref_name = 'HEAD'
AND r.history_index = 0
AND is_binary(b.blob_content) = false
AND cf.file_path NOT REGEXP '^vendor.*'
AND lang != ""

smacker avatar Dec 04 '18 18:12 smacker

The PR that enables auto in memory joins on small tables will mitigate this to a certain extent, but I don't think this can be solved in any way as long as we do multi pass on joins. There's just no way to know ahead of time the maximum number. It's a very confusing thing, but not one we can solve to the best of my knowledge.

Perhaps we should think of a way to display progress that will not incur in this. What if instead of showing X/Y we show just the number of partitions iterated? You will not know when it will end, but it serves the purpose of knowing if it's stuck and where it's stuck more or less. If we keep it this way, 90% of the time (when it's not in-memory) it's not gonna work.

erizocosmico avatar Mar 19 '19 11:03 erizocosmico

IMO, any changing counter which can show some progress would be good enough (and maybe less confusing than proportion)

kuba-- avatar Mar 21 '19 12:03 kuba--

Any ideas?

kuba-- avatar May 30 '19 08:05 kuba--

It just can't be done if join is performed in multipass. Only solution is to not display (x/y), but only x

erizocosmico avatar May 30 '19 08:05 erizocosmico

Could be possible to hide y only if join cannot be done on memory?

ajnavarro avatar Jun 03 '19 09:06 ajnavarro

It starts in in-memory mode and then switches to multipass during execution, so you don't know until then. And the switch is done per join on the join, so the table has no knowledge about that.

erizocosmico avatar Jun 03 '19 09:06 erizocosmico

I think in most of the cases having the total number of partitions is really useful, so I would rather keep it, even if sometimes can be misleading. We can add some documentation explaining what those numbers mean.

ajnavarro avatar Jun 03 '19 09:06 ajnavarro