rmlint icon indicating copy to clipboard operation
rmlint copied to clipboard

Is rmlint still maintained?

Open StatusCode404 opened this issue 1 year ago • 5 comments

Hi All, Just checking (again but this year) if rmlint is still maintained and supported by anyone?

There aren't that many responses to issues and there hasn't been a new tagged release since August last year.

StatusCode404 avatar Jun 14 '24 03:06 StatusCode404

Good question. I see that @cebtenzzre (https://github.com/cebtenzzre) has approved pull requests in the last 6 months so he is one maintainer. Let's ask if he needs help with the backlog of issues and pull requests.

The develop branch is years out of date, but has had some useful features added to it in the past that I'd like to use. How can we get them into the main branch?

RichLewis007 avatar Jun 22 '24 23:06 RichLewis007

Ping on this @sahib and @cebtenzzre , a number of PRs send to be trivial and could be merged really fast and please all rmlint users. Got here after seeing @mih showing rmlint in action to dedupdicate annex keys.

Cheers and let me know if we could be of some quick help ;-)

yarikoptic avatar Jul 24 '24 03:07 yarikoptic

If the maintainers will not reply, and are not maintaining this project, what are our options? Should we make a fork of it, and try to get the community to move there to actively make the fixes and changes needed?

RichLewis007 avatar Oct 08 '24 17:10 RichLewis007

Original author here. I gave over maintenance to @cebtenzzre some time ago. As far as I can tell he's not responding either, which I can't blame him for. Open source work is seldom rewarding. I can give somebody else access to the repo to sift through the PRs, but this person should have a track of doing some open source work already, as I don't want another xz incident. Ideally even more than one person.

This person won't be me though since I don't posess the time and motivation to do so and also would write rmlint a lot differently today. Please answer in this issue if somebody steps up to do the job.

Forking is an option of course too, but many upstream packages have this repo as source.

sahib avatar Oct 13 '24 14:10 sahib

I could probably give a hand with some trivial PRs and issues triage, may be an occasional upload to Debian, but not much beyond that. I have some track of FOSS development/maintenance (and also unfortunately abandoning as well ;) )

yarikoptic avatar Oct 14 '24 13:10 yarikoptic

@yarikoptic Much appreciated. I would prefer if there's one additional person to reduce the risk of getting into the current situation.

sahib avatar Oct 16 '24 20:10 sahib

Hey there.

Need another person? I'm an arch Linux user and would be available to help maintain.

CodingWithAnxiety avatar Dec 04 '24 01:12 CodingWithAnxiety

Hey there.

Need another person? I'm an arch Linux user and would be available to help maintain.

Cool! Thanks for raising your hand. Do you have some experience maintaining C applications? I see some Python experience which will be helpful for the test suite and UI.

@yarikoptic Still in? If yes I could give you guys access.

@cebtenzzre Please raise your hand if you do not want your access to be revoked.

sahib avatar Dec 04 '24 07:12 sahib

yes, but only to a very limited degree as described above.

yarikoptic avatar Dec 04 '24 14:12 yarikoptic

Hey guys! Another arch linux user here (impressive what a deleted AUR package can do!). I'm willing to put in some time helping maintain this, maybe reviewing PRs and helping keep the test suite running :)

As a side note, I think that probably the sanest idea for now is to try keep things working and focus on fixing known bugs rather than trying to add new features, mostly because we're all new on this project :)

fermino avatar Dec 04 '24 15:12 fermino

also might be worth for @sahib to establish some "gatekeeping" e.g. that every PR must be approved by some other contributor first to be able to merge. (although likely they might be not "hard enforced" or I am a super user everywhere... dang... but example could be https://github.com/citeproc-py/citeproc-py)

yarikoptic avatar Dec 04 '24 15:12 yarikoptic

It's a good idea! A 2-3 contributor approval + CI required to pass in the repo should a pretty robust filter.

fermino avatar Dec 04 '24 15:12 fermino

I did setup those rules for master and develop. I would recommend though that there is only 1 required approval as, this can easily lead to stagnation otherwise. I also recommend to enable a positive CI check before merging, but this requires some work first sd the original CI site (TravisCI) vanished.

@yarikoptic @CodingWithAnxiety @fermino: You should have collaboration invites now.

sahib avatar Dec 04 '24 19:12 sahib

Awesome, thank you!

Regarding CI, I will look into that. I'm guessing it shouldn't be too hard to migrate it from Travis. Free github action minutes should probably be plenty for now!

fermino avatar Dec 05 '24 00:12 fermino

@sahib probably a silly question but anyways: master is the latest branch, right? (Just making sure because I see the develop branch has other stuff but it's one year behind master).

fermino avatar Dec 05 '24 00:12 fermino

Hey there.

Need another person? I'm an arch Linux user and would be available to help maintain.

Cool! Thanks for raising your hand. Do you have some experience maintaining C applications? I see some Python experience which will be helpful for the test suite and UI.

@yarikoptic Still in? If yes I could give you guys access.

@cebtenzzre Please raise your hand if you do not want your access to be revoked.

Hi,

I mostly have python experience under my belt, though I'm still learning C and C++. I'd mostly be interested in helping testing and squashing bugs.

I'll keep my eyes on PRs and issues and see if I can't occasionally lend out a hand.

I will accept the invention once I am home. <3

CodingWithAnxiety avatar Dec 05 '24 04:12 CodingWithAnxiety

@sahib probably a silly question but anyways: master is the latest branch, right? (Just making sure because I see the develop branch has other stuff but it's one year behind master).

develop is supposed to be the current working version with newest features and fixes. master is usually the one with the latest stable, released software. PRs would go to develop first, on release you rebase or merge to master. You are of course free to use a different branching model, but I think it is worth to revive and streamline the develop branch.

sahib avatar Dec 05 '24 07:12 sahib

@sahib thanks for the info! I'm trying to figure out what to do with develop, mostly because I would not like to ship and release something not deemed stable (given user data is at stake :p).

I see that most of the commits (or at least the ones I looked up) are new features, am I right? So in that case maybe the best way would be to start off master (specially about some build fixes for rolling release distros I've been looking at) and then go about integrating the things from develop to master. Any thoughts?

fermino avatar Dec 07 '24 22:12 fermino

@fermino Sorry, bit late. Yes, seems like most features landed on develop, but some fixes are also on master, so the two need to be merged. First step would be to put this merged state on a separate branch, as most people compiling from source will master, but the docs mentions develop. Once that new branch seems stable it can be moved to develop.

Upstream distros will not update until a new tag/release is pushed.

sahib avatar Dec 11 '24 17:12 sahib

would write rmlint a lot differently today

This may be off-topic here, but I would be very interested to have a bit more detail on what rmlint could have been if you had started it in 2025.

And thanks a lot for rmlint, this is a useful software that I trust and have found useful. I also wrote my own software from scratch several times that share similar features. That is why I am interested in your own feedback after several years of experience of a larger and well-written project like rmlint.

vassilit avatar Jan 03 '25 13:01 vassilit

would write rmlint a lot differently today

This may be off-topic here, but I would be very interested to have a bit more detail on what rmlint could have been if you had started it in 2025.

And thanks a lot for rmlint, this is a useful software that I trust and have found useful. I also wrote my own software from scratch several times that share similar features. That is why I am interested in your own feedback after several years of experience of a larger and well-written project like rmlint.

Hmm, this probably would deserve a longer post, but here's what comes to mind:

  • Do one thing well: Remove all the other lint stuff, like empty files, nonstripped etc. This appeared useful to me back then, but is more annoying in hindsight, especially since it makes the implementation harder.
  • Also allow integration for tools that find similar images, or other specialized use cases (i.e make the hashing-core exchangeable in the architecture)
  • Do not offer a UI: That was only because I got interested in GTK and cairo if I'm honest. I do not mind it as separate project, but this attracts the wrong user base to a power tool.
  • Only offer one checksum implementation and remove paranoia mode. Modern checksums like blake are fast enough and very sturdy. There is really no need in having an additional paranoia mode, considering it made things rather complex.
  • Remove xattr support. That stuff never worked right.
  • Heavily reduce the amount of knobs in the command line interface. More is not better. Especially if you run out of letters.
  • Base the new implementation based on io_uring. The current multi-threaded hasher has some needless complexity.
  • Write it in a modern language that allows easy cross-compilation. This would reduce bugs, make developer life less miserable during debugging and speed would be pretty much the same with Go/Rust/Zig. I would pick Go. We had so many bugs because we made the memory management awkward and I want that life time back. I only knew C back then and Go was not yet available.
  • Less outputs. The progressbar should be a proper ncurses or bubbletea one. No python output or CSV. There are tools for that.
  • Write the benchmarks before optimizing. ;-) This has too often turned to an exercise in optimization uncritical paths. Also do not care so much about rotational disk anymore. They are already an edge case now, but have been the norm back then.

There were good ideas though:

  • Keeping it non-interactive.
  • Merging directories (should be default)
  • Have a JSON output for scripting
  • Replay as an idea is actually not bad.
  • The test suite using black box tests of the actual binary is nice.
  • Distinction between original and duplicate and tooling to decide.

Maybe this can also serve as inspiration for the current maintainers.

sahib avatar Jan 03 '25 16:01 sahib

I like the --replay ability, but it should be the main modus rather than an odd add-on. Id like the app to always produce a database at first scan, that can be use for subsequent clean passes. Reasoning is the speed of operation.

Otherwise as of today rmlint is not maintained, and contains data-destroying bugs, see #672. Thats bad enough it should be removed from distro repositories.

misieck avatar Feb 04 '25 21:02 misieck

I like the --replay ability, but it should be the main modus rather than an odd add-on. Id like the app to always produce a database at first scan, that can be use for subsequent clean passes. Reasoning is the speed of operation.

Otherwise as of today rmlint is not maintained, and contains data-destroying bugs, see #672. Thats bad enough it should be removed from distro repositories.

I've never used "--replay". But it is clearly stated in the rmlint manual page that it will not honour hardlinks if you choose --replay. https://github.com/sahib/rmlint/issues/672#issuecomment-2638201349

By design, some options will not have any effect. Those are:

    --followlinks
    --algorithm
    --paranoid
    --clamp-low
    --hardlinked
    --write-unfinished
    … and all other caching options below.

Read the manual and you won't destroy your data @misieck by eagerly using an option you don't fully understand.

This is not a bug since it is by design. This should go under a feature request for the next version.

@fermino has volunteered to maintain. There's no need to remove this from distros as it works.

StatusCode404 avatar Feb 05 '25 22:02 StatusCode404

No, --hardlinked, means report hard links as duplicates. The negative of that is not to report them as duplicates and thus preserve both files. rmlint deletes all hard linked files. There is no world in which this could be by design. People are losing data with this.

misieck avatar Feb 06 '25 04:02 misieck

* Do one thing well: Remove all the other lint stuff, like empty files, nonstripped etc. This appeared useful to me back then, but is more annoying in hindsight,  especially since it makes the implementation harder.

I'm just a lurker and new to rmlint, but heartily endorse this idea. I recently got caught by this. I wanted to dedup a USB backup drive from a Windows system on my Ubuntu box (similar to a use case mentioned in the tutorial), and got 0 duplicates and thousands of empty files & directories, apparently due to "bad GIDs". If I have correctly understood the empty file/directory handling, running the shell script would have deleted almost everything. The reason I sought out rmlink (it wasn't that easy to find) is its excellent options for prioritizing (i.e. defining "originals").

rsbrux avatar Feb 08 '25 05:02 rsbrux

@Sahib What is the status on the maintenance pick-up? I noticed @fermino 👏 being busy but also being blocked by required reviews? I could provide some assistance but I am aware that my Github pedigree is not showing that much as I have mostly worked on/for closed source projects. I started in C/C++, but nowadays all over the place depending the need at hand (embedded/back-end/mobile).

RayOei avatar Feb 27 '25 17:02 RayOei

Hey guys, been a bit busy lately :)

@sahib I agree with all the ideas you mentioned up there. Would you be against simplifying this codebase? Or you think it would be better to start it from scratch in another language? (Personally it would be a great time to dive on zig but Go would be great too). I'm mentioning this because I don't think I understand understand the current codebase enough to replicate it without messing up on some of the same things 😆

I'm thinking about maybe thinking of it as a library and then a separate CLI interface which would allow for a GUI in the future.

Also, having a good user base testing a new implementation of the project would be great, so if people are willing to chime in with the most used features and what pitfalls they see I'd be willing to start a rewrite on my free time.

Anyways, just some random thoughs as I'm reading this :)

fermino avatar Feb 28 '25 17:02 fermino

@fermino I would advise against rewriting from scratch. That would be something for a successor of rmlint, that has not the restrictions imposed by following how `rmlint does things. But for this projects the value comes from a user base relying on it - even on the weirder features. From my point of view, the most important thing right now would be a release that fixes some of those dangerous and outstanding bugs. This would greatly help the user base and reduce maintenance effort.

@RayOei Sure, the more people the better I guess. 😄 Please pick a ticket/review to work on and once you figured out a fix I will give you access.

@CodingWithAnxiety @yarikoptic Reminder that you still have collaborator access. Since you did not pick something to work on yet I would remove access after some days, unless you tell me not to.

Thanks to anyone helping.

sahib avatar Feb 28 '25 21:02 sahib

Hi @sahib, first action was to check/comment a PR (#678) as that one is very simple 😁

RayOei avatar Feb 28 '25 21:02 RayOei

Hi @sahib, FYI: I added #683 and I think I have a fix for #673 too (needs more checking though).

RayOei avatar Mar 01 '25 11:03 RayOei