Useful code for checking progress during indexing
Hi all, just wanted to share this very helpful query that will check the status on the create index process. -I have one thats been running for almost 24 hours and seeing the progress creep up has been a stress relief.
This is the results you get if you run it in pgadmin.
SELECT
now()::TIME(0),
a.query,
p.phase,
round(p.blocks_done / p.blocks_total::numeric * 100, 2) AS "% done",
p.blocks_total,
p.blocks_done,
p.tuples_total,
p.tuples_done,
ai.schemaname,
ai.relname,
ai.indexrelname
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid
LEFT JOIN pg_stat_all_indexes ai on ai.relid = p.relid AND ai.indexrelid = p.index_relid;
source: https://dba.stackexchange.com/questions/11329/monitoring-progress-of-index-construction-in-postgresql/249784#249784?newreg=7a3d794dcb154b6782ca390ead375050
Thank you! That's really useful. Those indexes on the json column I know are very slow to build. And probably not very efficient!
The whole thing took about 2 days for me last time I ran it. The indexes were a long time, and then the final full vacuum command, as that rebuilds the data. That also may be unnecessary. Because there's a lot of processing of the data, I thought it would be a useful way of reducing the final database size. I'll need to double check again how effective it is though.
One next stage to the project I will start having a look at is more tailored indexing. Some of the data would be useful in a full-text search, and I think that would mean pulling out the relevant data fields (e.g. book description, title) into a tsvector column. And then indexing only specific fields in the json.
Also, thanks for the PR! Apologies for the delay, it's been a busy week so I've been a bit delayed getting a chance to look at it. But I'm looking forward to seeing the tests run!