etherpad-lite
etherpad-lite copied to clipboard
Limit number of versions of a pad or delete them in Pad Settings
I don't see the possibility of limiting the number of versions of a pad or cleaning up old versions. The database grows without limit over time and exporting pads becomes long and heavy with numerous revisions . Would it be possible to have this revision deletion option in Pad Settings?
Fred
To add on to this, it should be possible to disable the revisions feature entirely (at least for read only pads), either via a plugin or through etherpad settings. For documents that have been published using a read-only link, anyone can go back and view the author's writing process which results in: Best case (a few embarrassing spelling mistakes) or Worst Case (personally identifying information, sensitive content)
I absolutely need this feature too ... or at least "delete revisions older than 30 days" or alike.
I used Etherpad (via Docker) for two months now. About hundred users.
The postgresql database is zipped 400 MB now (!) This is insane.
- We require an option to clean old revisions older than 30 days.
- Optional an option to keep the last x revisions.
- Definitely an option to disable revisioning.
Point 3 is actually the most helpful to save storage.
Usage of my Etherpad: Statistics: Pads Text Size: 611.475 - Total Pads: 971
611 MB / 971 pads = 0.63 MB per pad
At least half of them are empty, so it's probably 1 - 2 MB per pad in average.
And those pads contain only 100 lines of text.
BTW, Size: 611.475
... I guess it means KB
.
In each revision there is an option:
Does this disable the revisions for this pad?
If so, this option we would need globally. Disable all revisions for all pads.
The next version of Etherpad will feature a built in way to manage pads and also delete them. You can also sort by revision number and can then delete pads with a lot of revision numbers.
That is nice to see, however, admins do not want to manually spend 20 - 30 min a day to go through all pads... also considering data privacy/protection, this would not even be allowed.
Again, we need:
- option so revisions older than 30 days are removed
- option to keep the last x revisions
- option to disable revisioning
Disabling revisioning is the most important one.
Highly agree with this issue.
With privacy laws being quite strict in Germany, our university's data privacy officer called out the fact that it's actually quite problematic that we dont regularly delete our pads. Currently, author names of pads are stored indefinitely since they are all over the revisions, and without a way to delete those, we now have to set all pads to self destruct after 2 years of not being changed (which is really annoying for the kinds of pads people keep revisiting for information but don't edit).
Even if not automated, a simple solution to let us mass-delete revisions manually would help immensely. It might also speed up the process of getting the system up and running again after a shut down for maintenance.
I played a little bit around how this could be achieved:
We can reuse the method/API copyPadWithoutHistory
which generates a new pad with only 1 revision and the latest pad content (without any chat messages), but this will keep the author information.
Would this be helpful if we build a plugin that is running this function on pads that have not been touched in x days?
I played a little bit around how this could be achieved:
We can reuse the method/API
copyPadWithoutHistory
which generates a new pad with only 1 revision and the latest pad content (without any chat messages), but this will keep the author information.Would this be helpful if we build a plugin that is running this function on pads that have not been touched in x days?
That sounds great. I think that's a good idea. I don't know if it would be a plugin or core. Essentially anybody has the problem that revisions aren't deleted.
but this will keep the author information
Does that mean ALL author information or just the last active author? If it is just the last one, I think that might be fine enough (though does this method also get rid of all author colors?), so ultimately a really helpful solution.
but this will keep the author information
Does that mean ALL author information or just the last active author? If it is just the last one, I think that might be fine enough (though does this method also get rid of all author colors?), so ultimately a really helpful solution.
I think I need to clarify the storage of author information.
Every author has a global ID with his color and a name and related pads:
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|key |value |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|globalAuthor:a.laKubcKE8c8MXbgR|{"colorId":"#c7d5ff","name":"John Doe","timestamp":1717928115518,"padIDs":{"abcdefaertdfdf0.3967821042768165":1,"abcdefaertdfdf0.2496401404544648":1,"abcdef":1,"test":1,"gblub":1,"asdfasfsdaf":1}}|
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
This global ID (a.laKubcKE8c8MXbgR
) is referenced in the pad meta data:
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|key |value |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|pad:test|{"atext":{"text":"...\n","attribs":"..."},"pool":{"numToAttrib":{"0":["author","a.laKubcKE8c8MXbgR"],"1":["bold","true"],"2":["italic","true"],"3":["underline","true"]},"nextNum":4},"head":120,"chatHead":-1,"publicStatus":false,"savedRevisions":[]}|
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Additional to the pad meta data the author is also referenced in the according revision:
+----------------+----------------------------------------------------------------------------------------------------+
|key |value |
+----------------+----------------------------------------------------------------------------------------------------+
|pad:test:revs:10|{"changeset":"Z:u>3|3=l=8*0+3$es ","meta":{"author":"a.laKubcKE8c8MXbgR","timestamp":1717845374599}}|
+----------------+----------------------------------------------------------------------------------------------------+
This means that:
- global author information will NOT BE DELETED
- the reference in the pad meta data will NOT BE DELETED
- if the revision/change, that the author has done, will be deleted than also this reference will BE DELETED
and this results in this behaviour:
- In the timeslider you will only see author information of authors that have text in the latest version
- If you export the pad in "Etherpad"-format than you still get all author informations (name, color, etc.)
and this results in this behaviour:
- In the timeslider you will only see author information of authors that have text in the latest version
Hmm, okay, that wouldnt quite fix the issue for us... Is it possible to auto-delete global author information (or neutralize, eg change name to "inactive author" or sth) after a certain amount of time has passed since their last assotiated contribution timestamp? Bc that in combination with the other proposed method would work out practically all issues we have.
Or alternatively something that lets you auto-change the authors on any timestamp older than X to a global "dummy" author that is called "inactive author" or sth like that?
and this results in this behaviour:
- In the timeslider you will only see author information of authors that have text in the latest version
Hmm, okay, that wouldnt quite fix the issue for us... Is it possible to auto-delete global author information (or neutralize, eg change name to "inactive author" or sth) after a certain amount of time has passed since their last assotiated contribution timestamp? Bc that in combination with the other proposed method would work out practically all issues we have.
Or alternatively something that lets you auto-change the authors on any timestamp older than X to a global "dummy" author that is called "inactive author" or sth like that?
I'll add an inactive author dummy variable. That should get rid of all the globalAuthors.