etherpad_mypads_ynh icon indicating copy to clipboard operation
etherpad_mypads_ynh copied to clipboard

How to reduce the size of a big etherpad db

Open zamentur opened this issue 10 months ago • 0 comments

On sans-nuage.fr we have a big big database 30GB for 68 milions of lines.

If you face the same situation, here what we have done (we don't want to delete pad after a delay but if this solution is ok for you, some plugins exist to do it).

IMPORTANT

Allmost all following mysql command should be done with etherpad shut down:

systemctl stop etherpad_mypads

IBD file is not shrink

Check the size of your ibd file

ls -l /var/lib/mysql/etherpad_mypads/

Firstly, you need to know that ibd file doesn't shrink automatically, the data inside could take less place than the file itself.

Here you can know the size of the data:

mysql -e "SELECT round((data_length+index_length)/1024/1024,2) FROM information_schema.tables WHERE  table_schema='etherpad_mypads'  AND table_name='store';"

You can optimize the ibd file by running this, but we are not sure of the effect if you have not enough disk spaces (so you may reduce the number of lines before to do it, to reduce the size of data).

mysql -e "ALTER TABLE etherpad_mypads ENGINE=InnoDB;"

Sessionstorage nightmare

We discovered that on 68 milions lines there was 65 milions line of sessionstorage: not so useful (in the context of yunohost). If you delete it you may lost user preferencies. To delete it, we discovered that delete it per 100 000 was quicker than doing a delete of more:

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39; do echo $i; mysql etherpad_mypads -e 'DELETE FROM etherpad_mypads.store WHERE `key` LIKE "sessionstorage:%" LIMIT 100000;' ; done

Yes it's a bit crappy, but you can improve this hawful loop :/

History of revision

Another things that takes milions of line was pad history (3 milions in our case). You need to know that each 100 revisions, etherpad store all the text (and not just the diff). Here we delete in the db all revisions not finishing by 00, the timelines will be broken, but if needed we will be able to retrieve intermediate versions:

mysql -e 'delete from etherpad_mypads.store where store.key like "pad:%:revs:%" AND store.key not like "pad:%:revs:%00";'

Or if you want to delete the complete history

mysql -e 'delete from etherpad_mypads.store where store.key like "pad:%:revs:%";'

Empty pad

We have discovered 1000+ empty pads,

mysql etherpad_mypads -e 'SELECT substring(store.key,5) as "pads" FROM etherpad_mypads.store WHERE `key` LIKE "pad:%" AND `key` NOT LIKE "pad:%:%" AND value LIKE "%Welcome to Etherpad%" AND `key` NOT LIKE "pad:libreto+%";' > /root/padsempty

cat /root/padsempty | wc -l

We deleted those pad by running the etherpad api Note: We don't want to delete those beginngs with libreto cause they will be recreated by a libreto display.

systemctl start etherpad_mypads
for pad in $(cat /root/padsempty) ; do
curl --data-urlencode "apikey=$(cat /var/www/etherpad_mypads/APIKEY.txt)" https://pad.sans-nuage.fr/api/1/deletePad?padID=$pad
sleep 1
done
systemctl stop etherpad_mypads

Don't know if the sleep 1 was necessary, but their is a rate milit in the settings file, that's why we put that to be sure.

Search for the bigger pad

A way to reduce things is to observe how the pad are big ro not and to decide of actions for some of them, you can generate a csv file by running this command:

mysql -e 'select distinct substring(store.key,5,locate(":",store.key,5)-5) as "pads" from etherpad_mypads.store where store.key like "pad:%";' > /root/padsname

for pad in $(cat /root/padsname) ; do mysql etherpad_mypads -s -r -e 'SELECT "'$pad'", COUNT(`key`) FROM etherpad_mypads.store WHERE `key` LIKE "pad:'$pad':revs:%";' | tail -n1 >> /root/pads.csv ; done

With Libreoffice you will be able to sort by order of number of revisions, it gives an idea.

Retry to optimize

At this step recount the number of line and recompute data space

mysql -e 'selectcount(*) from etherpad_mypads.store ;'
mysql -e "SELECT round((data_length+index_length)/1024/1024,2) FROM information_schema.tables WHERE  table_schema='etherpad_mypads'  AND table_name='store';"

And try to optimize the table if you have enough disk space...

mysql -e "ALTER TABLE etherpad_mypads ENGINE=InnoDB;"

Don't forget to restart etherpad_mypads

And test everything is ok !

Ensuite on pourra faire un OPTIMIZE TABLE ou un ALTER TABLE pour réduire la taille MAIS il y a un gros MAIS: pas sûr qu’on ai assez d’espace disque pour que mysql s’en sorte :confused: https://ma.ttias.be/mysql-calculate-free-space-ibd-files/

zamentur avatar Aug 26 '23 14:08 zamentur