django-dbbackup
django-dbbackup copied to clipboard
Deciding the Project's Future
Summary
Right now we're at a bit of an impasse. It's noted in the original readme that django-dbbackup ...tries to use the traditional dump & restore mechanisms
. In terms of the history of this project, it's possible that this used to be true. However, the current implementation appears to heavily rely on custom connectors in order to facilitate data dumps.
My proposal is implementing some breaking changes in order adhere to the original project description, and to limit breaking issues caused by django-dbbackup
internals.
Suggested Path Forward
The suggestions below would convert django-dbbackup to more of an "upgraded" dumpdata
/loaddata
rather than a completely different kind of backup engine.
- Utilize Django's
dumpdata
andloaddata
for doing the heavy-lifting in terms of serializing data- Using
dumpdata
with-o
outputs a character stream to stdout that we can utilize.
- Using
- Pass-though all of Django's integrated dump/load features, such as multiple compression types and export formats
- Add encryption support on top of all this
- Add in "bonus features", such as...
- Natively backing up to remote storage locations.
- Post-processing scripts (probably an array within settings.py, similar to Django middleware)
- Parallel execution of backup/restore on multiple databases by using subprocesses/threads
- Backup/restore up all databases by default, but also allow for backing up specific databases
- Convenient helper functions for supporting scheduled backups (via Celery/Huey)
- Automatically delete old backups over the configured maximum amount of backups
Thoughts, Comments, and Remarks
I'm opening this up for anyone to voice their opinion on the project direction. This would be a breaking change, so if there's a general consensus that this isn't the ideal project direction then we can reassess.
@Archmonger I think this is a great path forward for django-dbbackup
. I think the reality is that the project hasn't been maintained for a while, so if you have a vision and the time to execute in a manner that keeps the project healthy and lets us build upon existing libraries, it would be great for this project.
I likely won't have time to develop this until somewhere around April, so until then this ticket will remain open for people to voice their opinions.
New future user here: I'd like to use the project to backup/restaure my db/media. A completely new approach fully integrated with django itself feels ok to me
Hi! I used to maintain this project years back - I think that the proposals sound great. I'm only hear to show some sign of life since I'm receiving emails from Read the Docs with a warning about the project being abandoned, which I think it isn't. No one has contacted me in this regards.
@benjaoming Thank you for posting! Could you please add @Archmonger and myself to have permissions to push to RTD? We are the current maintainers and would like to get out a new stable release.
Hey @benjaoming sorry about that. I tried reaching out to jonathan-s and ZuluPro to gain access to the RTD. Neither has been responsive to providing access so I reached out to RTD themselves to assist.
As I found out this morning, RTD put in an abandonment check to see if they could give access to the docs for johnthagen and myself.
Let me know if you can add us as RTD maintainers, as it would be much appreciated.
@johnthagen @Archmonger - absolutely! I'll just need your RTD usernames to do that :+1:
Aha! It seems that this is already in order, there was already a jonathan-s
added as maintainer? And I've added Archmonger
supposing it's you?
Thanks! I've confirmed I've been added as a maintainer. I'll add in the Jazzband bot and johnthagen as soon as I get a chance.
Wishing you the best with this project, thanks for being in Jazzband :100:
@benjaoming Thank you for posting and your work to help this project move along.
@Archmonger I like the ideas you've presented and wanted to voice my encouragement to act with boldness and not be too weighed down by breaking changes.
I made it here because I was notified I no longer am a collaborator on the pypi project I created! No big deal, I don't think I had any significant contribution to this for many years now. It's been amazing to see the project grow beyond anything I imagined, and owe a huge thanks to @benjaoming and @ZuluPro for taking over when I had moved on. I'm personally fine with any direction the current maintainer wants to take this package, since I don't really consider myself a maintainer anymore, my voice shouldn't carry much weight.
To clarify, the line "tries to use the traditional dump & restore mechanisms" was originally meant to mean that we use pgdump for Postgres and mysqldump for MySQL etc, rather than Django's loaddata and dumpdata. The reason being is the db specific tools are generally more tailored to work with the database files better, especially when those databases reach much larger sizes. This project was originally created because I needed an easy solution to backup database files that were several hundred GB in size and Django's serializer was not up for the task at the time. Admittedly, I do not know if Django made improvements here or not. But I have to imagine, using pgdump is still much more superior to Django's dumpdata (and likwise for other databases and their own tools).
Again, I'll state that I am totally fine with any direction, but I suspect we both we may have interpreted the phrase "traditional dump & restore" to mean opposite things.
@pkkid Thank you for sharing this valuable historical context!
Aplogies @pkkid!
I'm trying to get all active maintainers funneled through the Jazzband org.
Within this GitHub org, PyPi & RTD access is really only needed for emergencies, everything else is handled by the Jazzband-Bot. To limit potential security vulnerabilities (ex. hacked PyPI accounts), I'm trying to keep that list short.
If you'd like to maintain control over the project I can put you in as a project lead. Just let me know!
Also, thanks for the context and clarification! Dumpdata is pretty solid for my use cases, but admittedly I haven't tried it on giant datasets. I'll take a stab at a side by side comparison and compare performance.
Ha, no need to apologize. The project is in great hands, and I appreciate you and @johnthagen taking reigns to keep this project alive. Thank you!
I've just seen and read this pinned issue after raising #468 yesterday (where I suggested a generic Python backup package that returned to using "traditional dump & restore mechanisms").
Is there a place for such a package at JazzBand? This would be a fork of the current project under a different name that went in a different direction.
Jazzband typically only hosts Django related packages, so I would doubt it.
Technically, it is fully possible to move the current connectors out of Django-DBBackup
and have them be standalone.
However, if we implement the changes I suggested in this issue we would be firmly tied to Django and unable to separate any of that functionality.
@isedwards I don't think you can start a new project in Jazzband based on a concept. See especially the section Viability here: https://jazzband.co/about/guidelines#viability
@johnthagen I'm thinking of spinning this repo out of Jazzband. I've been hesitant to make major changes or test/CI changes due to how slowly things are moving in Jazzband, which has spiraled into practically nothing getting done.
What's your thoughts on this, and would you assist in maintaining the package under either a new or old org?
I think spinning it out would be a fine idea. I'd help out with basic maintenance under a new org.
It would be nice to transfer the repo so we keep the stars.
It is fascinating that in this case jazzband as an organization seems to have the same challenges that independent OS projects do.
Jazzband is currently a centralized org with only one admin. So naturally, if that one admin becomes busy then things don't move forward.
👋🏻 Hey folks, wondering if any progress has been made towards using django's dumpdata
/loaddata
with this plugin?
May I suggest if we want this to work with dumpdata/loaddata, that we create a new database type instead of replace what we have? As mentioned above, the original intent of the project was specifically to make it easy to use the pgdump and their variants. Dumpdata is a more generic solution developed by Django but also comes with downsides for larger projects. However, I think it might slot nicely into a new db type (maybe generically at db/django.py
). Keep the old and support Django's way of doing things being the user's choice.
That's a really good idea. I agree that it's best represented as an optional DB type.
I don't know when I'll have time to develop this, I've been stretched pretty thin lately.
@WillNilges I agree with both @pkkid and @Archmonger dumpdata/loaddata could be just an additionnal DB types in dbbackup.