darling icon indicating copy to clipboard operation
darling copied to clipboard

Recusive wipe following symlinks and crossing file system boundaries

Open HinTak opened this issue 5 years ago • 32 comments

Sort of following up to my most recent comment to #848 . Neither the two recursive wipe checks for sym-links or crossing file system boundaries. Either of these would have protected against that sort of bug - it looks like /var/run sym-linked to /run is followed, and recurses down to /run/media/myusername/diskname, then crossed file system boundary, to the external disk.

This is exploitable - I might be able to cause that bit of code to recursively wipe anything, since /var/tmp (which is also recursively wiped) is writable by anybody...

HinTak avatar Jul 29 '20 17:07 HinTak

Neither the two recursive wipe checks for sym-links

I don't know what you're talking about

https://github.com/darlinghq/darling/blob/fa5348c8a9b338746b91a53a801343a8d046a66d/src/startup/darling.c#L933-L939

And you're yet to demonstrate that your data being wiped was actually caused by Darling...

bugaevc avatar Jul 29 '20 17:07 bugaevc

Quite a few studio routines automatically follow symlinks , unless you check lstats...

HinTak avatar Jul 29 '20 18:07 HinTak

This function uses readdir(), not "stdio routines".

readdir() reads directory entries; it sets DT_LNK in d_type for symlinks, and DT_DIR only for actual directories.

bugaevc avatar Jul 29 '20 18:07 bugaevc

opendir use a path name without checking it is a sym-link. At least the bottom ones.

HinTak avatar Jul 29 '20 22:07 HinTak

opendir() doesn't check if the path is a symlink, why would it. That means, if the whole directory being wiped (ex. $DPREFIX/var/tmp) is a symlink, it's gonna follow that one symlink. But you're claiming it follows symlinks recursively.

bugaevc avatar Jul 30 '20 04:07 bugaevc

/var/run is symlink to /run. To be honest, instead of arguing and being defensive, if you don't want to look at it, please just doing something else, make some unrelated code improvement, fix some other bug, etc. This is not constructive.

HinTak avatar Jul 30 '20 08:07 HinTak

/var/run is symlink to /run

But is $DPREFIX/var/run a symlink to anything? (No, it isn't). Wiping host's /var/run indeed would be disastrous, but we're not doing that.

instead of arguing and being defensive, if you don't want to look at it

This is not constructive.

Well, you have filed this bug report saying the recursive wipe is following symlinks. So we have to investigate, and either fix the bug if we confirm it, or explain you where you're mistaken. That's how it works, right? 😀

If that's not constructive enough, perhaps you could lead the way by demonstrating a situation where wipeDir() actually follows symlinks contained inside the directory being wiped?

bugaevc avatar Jul 30 '20 09:07 bugaevc

I'd prefer not to spend time on darling for a few weeks at least - not least because I need to spend time to recovering the 350 GB loss of a 1TB drive.

The 2nd loss of 20 GB (of a 32 GB drive) wasn't too bad - half of it was simply darling's cmake build directory (it was on that external drive).

And also because I no longer have a darling build directory - that 10 GB was half of the 2nd disk wipe.

HinTak avatar Jul 30 '20 09:07 HinTak

@bugaevc and others: I'd suggest you the main developers, even if you work inside a VM or virtualised environment, just make/copy some dummy files under /run/media/username/somedir/somefile as a matter of routine usage; and check that they are there before and after you start darling, as a matter of daily practice. That's simple enough to try to observe the bug and go towards fixing it. To be honest I'm a bit disappointed that nothing has been done to safe-guard against this kind of disaster.

HinTak avatar Jun 12 '21 19:06 HinTak

So I take it you're able to reproduce files under /run/media/username/ getting deleted around the same time you run Darling?

If so, that's great news! Please use strace to find out how that happens. Start strace on a separate terminal like so: sudo strace -fp PID -o darling-strace.log, where PID in PID of the shell from which you'll run Darling in the next step, then reproduce the bug, then grep the log for unlink/rmdir.

bugaevc avatar Jun 13 '21 05:06 bugaevc

@bugaevc I meant you should do it on a daily basis. Seeing it destroying two disks is enough for me.

HinTak avatar Jun 13 '21 14:06 HinTak

Let me ask you directly and unambiguously: have you reproduced this?

If you have, please attach strace logs (instructions above). You could also add some explicit logging to the code that's clearing $DPREFIX/var/run, to see if it actually deletes anything extra. Then we could go about investigating what happens and fixing it. Don't you want us to be able to help you?

If you haven't, there's no point in keeping this issue open.

In any case, please stop spamming unrelated issues with references to here (or https://github.com/darlinghq/darling/issues/848, or your other issues on the subject).

bugaevc avatar Jun 14 '21 13:06 bugaevc

I want to react to what happened yesterday in https://github.com/darlinghq/darling/issues/1006. First of all, thank you everyone for supporting us, it hasn't been easy to have to play the bad guy all alone :slightly_smiling_face:

As suggested by @tresf, I have now hidden the off-topic comments from that page (you can still read them if you want to).

@HinTak, I'll repeat what has been said before:

  • It's sad that you have lost your important files.
  • You said this could be an issue in Darling (in fact, you almost asserted that it certainly is). We have checked the relevant parts of our code, and concluded they look alright and shouldn't cause what you're saying has happened to you.
  • We have tried to reproduce this and have been unable to.
  • Based on this, we say that it's unlikely (but still possible) that your issue was caused by Darling, and that you shouldn't blame it on Darling unless you have other evidence (that you have not presented).
  • We have repeatedly asked you to reproduce the issue and provide us with logs of what actually happens; you haven't done that.
  • You've been spamming unrelated issue reports with references here and to your other issues on the topic, even after I have explicitly asked you to stop doing that.

Look, we don't want to ban you. Can we handle this as adults? Don't spam unrelated issues (as @TheBrokenRail has said, this is an issue tracker, not a review site), post constructive comments here (like actually attempt to reproduce it and tell us if you've been able to), and everything will be fine.

And please keep the politics (Trump, guns, whatever) out of it.


Now, let's take a second to talk about what (and whom) issue reports are for. There are two sides to an issue report:

  • A user helping the developer by letting the developer know that there is an issue with their software (the developer may have not known about that issue before).
  • A user wanting the developer to help them, by fixing the issue they are facing (and reporting).

In your case, you can rest assured that we have heard about your issue loud and clear. You can stop trying to pursue the former bullet point. If you want us to help you, you've got to work with us; we've told you what to do.


I wouldn't say that (quoting @TheBrokenRail) "As the reporter, the burden of proof is on you, you have to prove the issue was caused by Darling". We're not in a court either; nobody has to prove anything to anyone.

But as explained above, we're unconvinced that your issue was caused by Darling; yet you're welcome to present evidence (and not just more claims based on misunderstanding of how things work) to the contrary. While we're not happy about the mess this whole situation has been, we would certainly be happy to fix the issue, if it's indeed a Darling bug that causes it to wipe host files.

Finally, I want to respond to this rhetorical question:

Open-source or not, run by volunteers or not, if you put a piece of software out there which destroys data, do you have a responsibility?

The license makes exactly this question very clear:

https://github.com/darlinghq/darling/blob/9393db2c6ed530acaa2a4a933c391f1363fea1e8/LICENSE#L589-L610

While you may believe your data loss was caused by Darling, you absolutely cannot blame it on us the developers. You and only you are responsible for the software you chose to run. And it's not like we haven't warned you that running Darling outside of a VM may be dangerous.

bugaevc avatar Sep 01 '21 11:09 bugaevc

For those who are confident it is not a darling issue, you can try to 'fail to reproduce' by putting some files under /run/media//someplace on a daily basis while you work with/use darling. For me, I would prefer not to experience it again. I would not even try using darling again, until I have seen changes towards avoiding it happening. So far that had not happened, and proposed changes had been shot down. So I think I am waiting for another person to experience it (and I wish some of the developer's drives get wiped...), while warning people that it could happen.

HinTak avatar Sep 01 '21 13:09 HinTak

@HinTak

For the last time, it is not our responsibility to reproduce your bug.

This hasn't happened to anyone else, and I bet you're not the only one that uses external drives, not by a long shot.

If you want your bug fixed, you need to provide enough information so we can fix it. Without that information, we cannot fix it.

So either provide that information, or stop complaining.

TheBrokenRail avatar Sep 01 '21 14:09 TheBrokenRail

So, I continue to wait for it to happen to somebody else. And, if it should happen to somebody else, I hope that it happens to one of the developers, instead of one of the new users.

And, I think it is important to warn new users that it might happen to them, even though I am waiting for it to happen to somebody else.

HinTak avatar Sep 01 '21 14:09 HinTak

Is this issue at least reproducable by yourself

demhademha avatar Sep 01 '21 15:09 demhademha

I am more than okay to wait for it to happen to somebody else, and preferably to one of the developers.

HinTak avatar Sep 01 '21 15:09 HinTak

@HinTak

So, I continue to wait for it to happen to somebody else.

Sure, go ahead. I don't care.

And, I think it is important to warn new users that it might happen to them, even though I am waiting for it to happen to somebody else.

This is what I (and everyone else) has issue with. Your definition of "warning" seems to 100% match everyone else's definition of "spamming everywhere in an irritating fashion". This issue has only happened to you and you only last year! No one else has ever had this issue, you have provided no information whatsoever that might actually help fix it, honestly, I'm inclined to believe this issue has nothing to do with Darling simply based on how reluctant you are to provide the bare minimum of information. If you really care about this issue, give us the information that might actually help us fix it, then we might actually fix it and never have to read about it ever again.

TheBrokenRail avatar Sep 01 '21 17:09 TheBrokenRail

Populate /run/media//someplace (do it within the VM if you run darling in a VM) with some files. Check on them from time to time while you use darling in the next few weeks / months. Is that so difficult?

HinTak avatar Sep 01 '21 17:09 HinTak

I have been using Linux for about 25 years - since a patched kernel 1.2.14 (if I remember correctly) and quickly switched to 1.3.94 I think because I had a rather new computer then, 25 years ago. Have not had such big data loss (~250GB) for 25 years. Twice is two times too many.

HinTak avatar Sep 01 '21 17:09 HinTak

@HinTak

I have been using Linux for about 25 years - since a patched kernel 1.2.14 (if I remember correctly) and quickly switched to 1.3.94 I think because I had a rather new computer then, 25 years ago. Have not had such big data loss (~250GB) for 25 years. Twice is two times too many.

Wooooow... I don't care. You accepted the risks when you downloaded Darling, the license is quite clear, there is no warranty.

I get being annoyed that you lost data, I can also get being very frustrated about it, but complaining about it in irrelevant issues for the better part of a year isn't going to help new users, isn't going to help me, and certainly isn't going to help you.

The big problem with this issue is that it hasn't been confirmed and we don't have any logs. Darling is full of hundreds of moving pieces, many of which deal with files. Without any logs or traces, we are stuck looking for a needle in a haystack the size of Mt. Everest. And according to you, the only way to reproduce this is to essentially use Darling as a daily driver for a few weeks to a few months! This project is run by volunteers, and volunteers don't have that time for an issue only experienced by 1 person.

I wouldn't be surprised if Darling caused this issue, and it would be a major one if it did. The issue is that we don't know, and we don't have the time it would take to find out. So, essentially, us developers are also just stuck waiting for it to happen to someone again. There is a reason the developers have recommended running Darling in a VM previously on the Discord, it could break things.

In conclusion, stop spamming irrelevant issues, it won't help anyone.

TheBrokenRail avatar Sep 01 '21 19:09 TheBrokenRail

For those who are confident that it is not due to daring, it is easy enough to just make some files under /run/media//some-file-to-keep-an-eye-on, and try to "fail to reproduce" for a month or two. The effort of writing lengthly sacastic replies seems extraordinary by comparison. (and perhaps dishonest).

A general disclaimer "You use this at your own risk", is quite different from "we are aware that there is a small possibility that it would destroy your data under this specific circumstances". Being aware of specific possibilities and don't issue specific warnings is unethical.

HinTak avatar Sep 02 '21 21:09 HinTak

While the replies have been rather pointed, they aren't sarcastic.

The disclaimer of no warranty means that literally anything could happen. Literally. And there are other disclaimers that Darling can do significant damage, and there are warnings to run it in a VM.

I don't know exactly how Darling manages the filesystem, but it's possible that deleting in /run is intentional; someone would have to look into that. But the fact remains that the developers don't have time to monitor logs for weeks straight for an issue that has only been experienced by one person; for all we know, this is an issue with your system and not with Darling.

In any case, I get that losing data is more than frustrating; it's probably the scariest thing in the modern era. But this was something you could have prevented, and you could help everyone else by sharing logs from when the issue happened.

I will do what the other developers are not doing and try this for myself on a VM and keep logs around. If I see this, I will be back here with logs of what was going on at the time, and this can be resolved. If I check back in a month and the file is still there, I am declaring this solved or a non-issue.

rdrpenguin04 avatar Sep 02 '21 22:09 rdrpenguin04

Okay, I did a bit more research; yes, /var/run (and consequently /run) is absolutely meant to be cleared. I'm declaring this a non-issue; could someone close this?

rdrpenguin04 avatar Sep 03 '21 01:09 rdrpenguin04

Linux's automounter mounts user disk under /run/media//. I think you are confusing darling's /run vs the host's /run ...

HinTak avatar Sep 03 '21 01:09 HinTak

Darling has its own /var/run, which lives under ~/.darling/var/run for most people, which is not a symlink.

HinTak avatar Sep 03 '21 02:09 HinTak

Okay, I did a bit more research; yes, /var/run (and consequently /run) is absolutely meant to be cleared.

/var/run is expected to be clean on boot-up — and on Fedora, where @HinTak has experienced the issue, /var/run is a symlink to /run which itself is a tmpfs mount (so it naturally doesn't persist over reboots, no need to clear it explicitly).

What's definitely not expected is some program deleting files (and not the ones it has created itself) from /run in normal usage. So for instance your early boot-up or late shutdown procedure may include clearing the directory (after having unmounted all nested mountpoints, of course), but while you actually use your system nothing should be clearing it.

Inside a Darling prefix, there's its own /var/run, which is also cleared on container startup. So one possibility of how Darling potentially could be responsible is if that code misbehaves and instead deletes the real /var/run (but it seems to be written and work just fine). In particular, @HinTak has suspected that the code would follow symlinks from container's /var/run into the real one and delete files there (as evidenced by the title of this issue). So perhaps test that, too 🙂

bugaevc avatar Sep 03 '21 06:09 bugaevc

Okay, that's fair. I'll have to do some more setup to test that, but I certainly can do that

rdrpenguin04 avatar Sep 03 '21 11:09 rdrpenguin04

On Friday, 3 September 2021, 12:44:07 BST, Ray Redondo @.***> wrote:

Okay, that's fair. I'll have to do some more setup to test that, but I certainly can do that

Thanks. darling itself keeps a ~/.darling/var/ , ~/.darling/proc/ and does clean those. The suspicion - for what it is - is under some circustances, some of the apple-written components, or darling team written ones, would get confused, and escape ~/.darling/ and mess up files outside of ~/.darling/ . ("/run/media///" being outside... ). And how can one make sure it does not happen in the future when darling gets more or newer apple-written components .

HinTak avatar Sep 03 '21 16:09 HinTak