otp
otp copied to clipboard
[filelib] Fix wildcard function to not follow symlinks
Avoid possible DOS when more than 1 symlinks create loop inside the (current) working directory
CT Test Results
2 files 86 suites 31m 49s :stopwatch: 1 784 tests 1 736 :heavy_check_mark: 48 :zzz: 0 :x: 2 066 runs 2 016 :heavy_check_mark: 50 :zzz: 0 :x:
Results for commit 4c97f9ac.
:recycle: This comment has been updated with latest results.
To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.
See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.
Artifacts
// Erlang/OTP Github Action Bot
Unfortunately this is a backwards incompatible change and it also does not match the behaviour of wildcards on Unix (which do traverse symlinks). Perhaps we could store the expanded symlink in an accumulator and stop traversal if we visit a previously visited node?
Honestly my first intention to fix it was in a way you are proposing. But as I have seen more troubles than benefits in following symlinks, I came with this easy approach. One of the problems that can arise is the computational time in case of a big tree of files. The question is if we can't just make a note in the doc that symlinks are not followed and close eyes regarding to the backward compatibility - I'm really not sure if this can cause a harm somewhere but I faced a big troubles with the current state in production system :). In the meantime I'll try to bring the solution that detects loops and you can finally decide.
I should probably better explain the whole point of this PR. I'll explain the problem with currently available implementation of wildcard funtion on the example. Let's have the following directory structure with symbolic links pointing to:
-
"/here/is/target/symlink/a -> /here/is/target
-
"/here/is/target/symlink/b -> /here/is/target
Then after the function call filelib:wildcard("/here/is/**")
the function do_double_star
accumulates combinations of paths "/here/is/target/symlink/a, /here/is/target/symlink/a/symlink, /here/is/target/symlink/a/symblink/b ...
" where the final number of path combinations is limited by the OS or FS limits. The number of combinations is definitely huge enough (2^n
where n > 100
on my Linux where PATH_MAX=4096
) to exhaust available memory on standard Linux distributions or Windows OS on powerful machines with tens of gigabytes but even unrealistically more.
The testcase wildcard_symlink_loop/1
https://github.com/erlang/otp/pull/6179/files#diff-82e5eb1f83e28b1235e24a05ae2c2f4d115a19695b67415b8f18eadf50eb7601 can be easily applied on the current codebase so you can quickly raise the issue.
I generally consider to follow symlinks by default as unsafe (and maybe even unexpected) but I realize that your point about backward compatibility is crucial, so the 4c97f9a is really bad idea.
The change 6c2110e fulfills your idea about detection of loop. In fact that was my first local fix before I argued to myself that it:
- makes the function slower
- follow symlink is bad idea
There is still another solution for this problem we can discuss, which is to extend each filelib:wildcard/1
and filelib:wildcard/2
for
a FollowSymlinks = boolean()
argument. That will cause build time error so users of this functions will be quickly aware of the incompatibility and can fix it according to their needs. I would expect such change in the next major version of course.
The main benefit would be the better performance because there will be no need to check the loops.
But please consider seriously the 6c2110e for the really next release because there is a real risk of exploitation.
As no activity here for a significant amount of time, I'd like to ask if I can do anything to help to solve this issue?
Could you give some more details how the current implementation of filelib
could be exploited in a DOS attack?
It seems to me, that for an attack to be possible, you are allowing untrusted users to specify their own wildcards and/or to create symlinks in the file system. That seems to be dangerous in other ways.
I've pushed the required changes.
As a chance for possible DoS, imagine e.g. an email service that accepts emails from the online wild and one of its task is filtration of content of attachments (archived data etc.). Using this function to get all files from e.g. already extracted archive can give a chance to attacker if he just prepare such archive with symlink loops and then periodically sends this content. The problem is that process heap is exhausted which means the crash of the whole application.
As a chance for possible DoS, imagine e.g. an email service that accepts emails from the online wild and one of its task is filtration of content of attachments (archived data etc.). Using this function to get all files from e.g. already extracted archive ...
Extracting unsanitized filenames from an untrusted archive sounds like a recipe for disaster. In your example, a safer way would be to assigning the filenames yourself and just keep track of the mapping between your filenames and the originals for reporting the results.
Feel free to reopen this pull request if you come up with a solution.