yarsync
yarsync copied to clipboard
Add automatic delete old commits setting and log or command option for that
Like make a config option that auto delete commits and log older than 1 year. Make it optional, as it looks to be bad practice for some. It will surely remove the folder house keeping burden for others, since rsync do has limitations for large number of files.
I create commits in many repositories like once in several years (try to split them, as for git projects). Can I ask about your usage?
- is
rsnapshot
bad for that? - do you create many commits in a repository with many files? What kind of files are these?
- maybe a manual command, not an automatic deletion?
I mainly use it as a safer syncthing replacement for syncing small files, including pdf, doc and some binary files. Snapshots is over kill for that.
I am curious if you don't make a commit, how do you know which file are different between two repos. I know there is "yarsync push" and "yarsync pull", but it will cause file overwritten and not safe. For example, "yarsync pull remote --new" will get the remote version of a changed file and cause local modification to the same file lost. I really like git-annex, it will fetch the remote version and add a variant suffix to the conflicting file names, so the user can merge the diff manually.
Ideally, a good practice for personal git repo is to commit as much as possible. But the commit of yarsync is too heavy, a whole new set of symlinks are created, even for file that doesn't change, instead of creating too many duplicated hardlinks. Maybe only keep the hard links for file that does change, and you can see easily what files are changed in the commit folder, which will make commit more useful, like git-annex.
A manual command for merging commit is enough for me, for example, a command like:
yarsync rebase -c0:100 -m"rebase"
which means combined the first 100 commit and replace it with a new one. A continuous range in the middle should also work.
yarsync rebase -c10:100 -m"rebase"
To summarize, there are three feature requests:
- add variant suffix (like _variant + md5) to files that are different on both side during push or pull.
- shrink commit dir to only include hardlinks for file that has changed.
- add rebase command so that several commit folder can be merged.
- What do you mean by safer
syncthing
replacement? That it is not constantly working, but called on request? - As I write in its description,
yarsync
is for synchronizing unchanged files (that is repositories). You probably don't expect external pdfs or binaries to change? I would definitely use a VCS (like git) for changing text files. For doc files I've little ideas: I would keep them unchanged or use simple text files for changed ones (and put them into VCS, if they are precious). There was an option for safer synchronization, but at the moment it depends onrsync
(https://github.com/ynikitenko/yarsync/issues/4). This is very different from this issue. You can checkyarsync
option--backup
(seepull
andpush
options in the manual) to be safe. -
yarsync
commits are snapshots of whole repositories. Yes, they store all files, but at the same time you can delete old commits completely, and don't care about "merging" them or other things (this also improves safety).
Ideally, a good practice for personal git repo is to commit as much as possible.
This is true, and this is different for yarsync
. I usually create a commit when I add/move many files. A commit is necessary only if you are going to synchronize changes with a remote. I don't do it very often when I work on one machine. And note that rsync
is reported to be bad with millions of files, so if you have 100 commits, you will get problems only if there are about 10 thousand files in them (which will take a long time to manually collect).
I shall implement removal of commits, but this is not straightforward. Maybe I will need to keep track of deleted commits, because at the moment yarsync
encourages saving older commits (that may still be present in old repositories).
Do you need commits at all? Maybe you could mostly sync the working directory (I'll work on that then)? Do you just want to duplicate the current state to a remote storage, or do you also need to manage repository conflicts (if you, say, work from two machines each day; this will be really hard without commits)?
Syncthing need to run as a daemon, and need network connection. Some of my computers are offline, and I've heard that syncthing has data loss problems.
Pdfs and office documents do change, like adding a bookmark or deleting a page, for example, I find your project when I was looking a way to sync my Zotero storage dirs.
Keeping hardlink snapshots of whole repositories doesn't make it safer, adding dir pruning commands does, so the user never need to manually touch the commit dirs. Also delta backups saves inode.
I mainly use it to sync the working directories on different machines, and use a portable drive as the transfer media. But sometimes I left the portable at home, and sometimes I forget to sync, that would bring the storage out of sync and need manual intervention.
Even for my use case, I think commit messages do help in bring back memories of file addition and deletion. I really like git-annex, but its future doesn't look good being programmed in haskell. So I am looking for a less complicated solution like yarsync, if not for the git-annex like workflow, pyfisync is more straighforward:
https://github.com/Jwink3101/PyFiSync
There are other not actively maintained projects.
https://github.com/pfalcon/git-pynex https://github.com/jcftang/python-annex
Pdfs and office documents do change, like adding a bookmark or deleting a page, for example
This is right. yarsync
--backup option works great for that, did you try that? In the end you want to have only one pdf file with your bookmarks (not two). If you really need two files, you can just rename them (otherwise having files both in the working directory and in BACKUPS/ eventually will become a mess).
Also delta backups saves inode.
This should not be an issue, because ext4 supports thousands of inodes or more. You should not create so many commits.
I really like git-annex, but its future doesn't look good being programmed in haskell.
I don't think this is an issue. Haskell is a rather stable and professional language. Everything depends on the maintainer of git-annex
, whether he continues to support that. I don't use that due to other reasons.
"I am looking for a less complicated solution like yarsync"... "the user never need to manually touch the commit dirs"
There is a small contradiction here. yarsync
is simple, because the user can do whatever he/she wants. It happens that one may need to fix a repository manually. Some functionality is still missing (like removing commits), so it should be fine for a user to manually remove them.
I think that your use case is absolutely legitimate for this issue, thanks for all these details (and yes, commits will help you a lot). At the moment you can just manually remove old commits (don't forget to sort them by date) and use --force
key to push/pull these changes. This should be done not every time, so it should not be a big problem.
Another (though less technical) approach might be to split your repository into a small quickly changing one (like recent work) and into a more stable one (but with lots of files). Then you create smaller commits in the smaller repository and fewer commits in the larger one.
yarsyn pull --backup--dir is exactly what I am looking for, it should be added to push too, and maybe makes backing to .ys/backups/ the default.
Thank you so much!
I'm glad that it was helpful!
For your use case, if your files are really precious and can be easily corrupt, make this your default option (through a shell alias; maybe I will add a configuration option for that later).
The difference between pull and push is that at the local repository you can make needed changes and new commits. I'm not sure that pulling from a repository with backups/ (if you push backups to the flash drive and then pull them to another machine) will work fine (need to test that; rsync
excludes backup directory from the transfer set, if I remember correctly). Ideally, I would resolve all changes as soon as they appear (this is what is encouraged today), not push them through several repositories.
I think that --backup-dir should be better specified by the user (probably in the configuration, to be done), because it is more explicit (no one will be surprised by a suddenly appearing unknown directory).
I'm reopening this, because there should be really a way to remove older commits with a command!
@QiangF I think I found a solution to this automatic deletion of older commits. Now one can set maximum number of commits (for example, with yarsync commit --limit 20
- after done once, this limit will be persistent in .ys directory).
What do you think about that?
Many thanks for your input! I really appreciated that and will probably return to that once. The safety/backup feature does not work now as intended ( #4 ), but hope it will be implemented in rsync (it was too good to be true); in any case the program is no less secure than rsync (I hope), which is still rather good.