otp icon indicating copy to clipboard operation
otp copied to clipboard

[filelib] Fix wildcard function to not follow symlinks

Open jendis opened this issue 2 years ago • 9 comments

Avoid possible DOS when more than 1 symlinks create loop inside the (current) working directory

jendis avatar Jul 28 '22 05:07 jendis

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 28 '22 05:07 CLAassistant

CT Test Results

       2 files       86 suites   31m 49s :stopwatch: 1 784 tests 1 736 :heavy_check_mark: 48 :zzz: 0 :x: 2 066 runs  2 016 :heavy_check_mark: 50 :zzz: 0 :x:

Results for commit 4c97f9ac.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

github-actions[bot] avatar Jul 28 '22 05:07 github-actions[bot]

Unfortunately this is a backwards incompatible change and it also does not match the behaviour of wildcards on Unix (which do traverse symlinks). Perhaps we could store the expanded symlink in an accumulator and stop traversal if we visit a previously visited node?

josevalim avatar Aug 10 '22 05:08 josevalim

Honestly my first intention to fix it was in a way you are proposing. But as I have seen more troubles than benefits in following symlinks, I came with this easy approach. One of the problems that can arise is the computational time in case of a big tree of files. The question is if we can't just make a note in the doc that symlinks are not followed and close eyes regarding to the backward compatibility - I'm really not sure if this can cause a harm somewhere but I faced a big troubles with the current state in production system :). In the meantime I'll try to bring the solution that detects loops and you can finally decide.

jendis avatar Aug 10 '22 11:08 jendis

I should probably better explain the whole point of this PR. I'll explain the problem with currently available implementation of wildcard funtion on the example. Let's have the following directory structure with symbolic links pointing to:

  • "/here/is/target/symlink/a -> /here/is/target

  • "/here/is/target/symlink/b -> /here/is/target

Then after the function call filelib:wildcard("/here/is/**") the function do_double_star accumulates combinations of paths "/here/is/target/symlink/a, /here/is/target/symlink/a/symlink, /here/is/target/symlink/a/symblink/b ... " where the final number of path combinations is limited by the OS or FS limits. The number of combinations is definitely huge enough (2^n where n > 100 on my Linux where PATH_MAX=4096) to exhaust available memory on standard Linux distributions or Windows OS on powerful machines with tens of gigabytes but even unrealistically more. The testcase wildcard_symlink_loop/1 https://github.com/erlang/otp/pull/6179/files#diff-82e5eb1f83e28b1235e24a05ae2c2f4d115a19695b67415b8f18eadf50eb7601 can be easily applied on the current codebase so you can quickly raise the issue.

I generally consider to follow symlinks by default as unsafe (and maybe even unexpected) but I realize that your point about backward compatibility is crucial, so the 4c97f9a is really bad idea.

The change 6c2110e fulfills your idea about detection of loop. In fact that was my first local fix before I argued to myself that it:

  1. makes the function slower
  2. follow symlink is bad idea

There is still another solution for this problem we can discuss, which is to extend each filelib:wildcard/1 and filelib:wildcard/2 for a FollowSymlinks = boolean() argument. That will cause build time error so users of this functions will be quickly aware of the incompatibility and can fix it according to their needs. I would expect such change in the next major version of course. The main benefit would be the better performance because there will be no need to check the loops.

But please consider seriously the 6c2110e for the really next release because there is a real risk of exploitation.

jendis avatar Aug 29 '22 12:08 jendis

As no activity here for a significant amount of time, I'd like to ask if I can do anything to help to solve this issue?

jendis avatar Oct 10 '22 08:10 jendis

Could you give some more details how the current implementation of filelib could be exploited in a DOS attack?

It seems to me, that for an attack to be possible, you are allowing untrusted users to specify their own wildcards and/or to create symlinks in the file system. That seems to be dangerous in other ways.

bjorng avatar Oct 11 '22 04:10 bjorng

I've pushed the required changes.

As a chance for possible DoS, imagine e.g. an email service that accepts emails from the online wild and one of its task is filtration of content of attachments (archived data etc.). Using this function to get all files from e.g. already extracted archive can give a chance to attacker if he just prepare such archive with symlink loops and then periodically sends this content. The problem is that process heap is exhausted which means the crash of the whole application.

jendis avatar Oct 11 '22 11:10 jendis

As a chance for possible DoS, imagine e.g. an email service that accepts emails from the online wild and one of its task is filtration of content of attachments (archived data etc.). Using this function to get all files from e.g. already extracted archive ...

Extracting unsanitized filenames from an untrusted archive sounds like a recipe for disaster. In your example, a safer way would be to assigning the filenames yourself and just keep track of the mapping between your filenames and the originals for reporting the results.

frej avatar Oct 11 '22 12:10 frej

Feel free to reopen this pull request if you come up with a solution.

bjorng avatar Jan 16 '23 05:01 bjorng