distro
distro copied to clipboard
Manually resolves links relatively to `root_dir`, and prevent escape
Dear Sebastian, dear maintainers,
This PR drafts a first implementation to address concerns raised in #311 review.
Currently, paths are not manually resolved, and by following symbolic links, this may lead to outside specified root_dir
.
By using os.path.realpath
and os.path.commonprefix
, we can verify that important paths are still prefixed by root_dir
, once they have been resolved.
It turned out enforcing those checks was not as easy as I thought :
- some paths are provided on purpose as parameters, others automatically derived
- some paths are known during instantiation, others will be "later"
- I'm not sure
PermissionError
actually fits, but I was out of option - I'm not sure the error message itself is relevant either
- New
__prevent_root_dir_escape
function might require a renaming too
At least, the new functional test actually enlightens the issue.
--> A deep review is required, ideas are welcome and commits on this branch too :upside_down_face:
Have a nice WE all :bow:
Dear Sebastian, sorry for the delay.
I got your point and understood what you were trying to achieve : I did miss the part about resolving symbolic links relatively to root_dir
.
So I tweaked your function a bit in order to :
- ease any future Windows integration task (see #177)
- detect symbolic links loop (as
os.path.realpath
does) - fix
os.readlink
usage on regular files (it actually raisesOSError: "Invalid argument"
) - fix
os.path.commonprefix
usage by removing redundant "up-level references"
You will see that I opted for links resolution on the verge of calling open
. This definitely simplifies the constructor.
I've implemented three short test cases to check our new logic (note : relative symbolic links resolving is already tested by regular rootfs under tests/resources/distros
).
I also took the liberty to update our .gitignore
file, as my working directory used not to be clean since #315.
Bye ! Thanks for your time :wave:
Hi @HorlogeSkynet kudos for the new tests and the loop detection! +1 I believe a few details need more work, please see my comments below
:pray: I've answered/resolved your comments Sebastian. ~~Although, I'm concerned about the CI failure, as tests pass "on my machine". I think this is due to an issue related to GitHub Actions and symbolic links resolving to directories (I might be wrong).~~
My mistake ! There was an empty directory that Git didn't track, I've just fixed this :wink:
Waiting for your inputs, thanks for your time :wave:
EDIT : If you are tired of these round trips, feel free to directly amend the branch !
PR's (finally) ready @hartwork !
Dear @python-distro/maintainers, Dear @hartwork (if you haven't muted notifications from this repository :upside_down_face:),
I've finally taken the time to rework this feature (fix ?).
What has been done :
-
os.path.commonpath
has been preferred overcommonprefix
(first relative path normalization has been added to prevent standard library exception ; - 100% (99,99%) code coverage (for newly added code) ;
-
os.path.realpath
have been removed from iteration to prevent OS links resolution, but there is one left for final path to detect some escape cases ; - Three new tests (two for
os_release_file
as well as another one for above chroot escape case) ; -
root_dir
is resolved, mainly to deal with..
(as you correctly pointed out Sebastian) ; - link check before resolution instead of
try..except
(indeed simpler) ; -
OSError
has been replaced byFileNotFoundError
(symlinks loop corresponds to a missing file, at least from the caller PoV) ; - a "normalization" for relative paths has been added, to strip
../
that would legitimately resolve to chroot in a real-world situation.
This branch as also been rebased, and hence now "ready".
Thanks for your time, Bye :wave:
Dear @python-distro/maintainers, Dear @hartwork (if you haven't muted notifications from this repository 🙃),
Hi @HorlogeSkynet, back in 2022-08 — soon two years ago — I implemented a solution on branch security/prevent_rootfs_escape_2
that still exists today. I remember you were unsure if the approach was the best way forward and called out to all maintainers for feedback (it's archived in https://github.com/orgs/python-distro/teams/maintainers/members/archived_team_posts.json for everyone with read permissions); the topic and my branch has since been ignored altogether, until your new commits here 5 days ago. 2022 is a long time ago, and I'm not interested in getting back into this topic again in 2024, myself. Thanks for your understanding, I wish this pull request the best.
Hey @hartwork, and thanks for your answer. Indeed, I completely forgot about this (these) discussions, since GitHub archived them...
I took a quick glimpse of your branch which I find very complex (but this bug/feature is not trivial at all, and my new algorithm is possibly sill broken). Although, I've updated a test case according to your commit 238925a54e91a995b923b2936c6504c262a5f638.
So I'll sum up the situation here again, so @python-distro/maintainers will have the full context to properly review this (these) branches :
The idea is to "fix" the --root-dir
option (brought in python-distro/distro#161), which allows a separate rootfs to be specified (from CLI, or through LinuxDistribution
API). As noticed first during #241, and then #311, multiple issues are related to this parameter in it's current implementation :
- [fixed during #311] this led to "unknown" binary executions, which possibly depend on host information (like the running kernel with
uname
) or even unsupported architecture (e.g. an ARM rootfs on an x86_64 host) - paths under
root_dir
are mis-resolved to paths outsideroot_dir
: this means that opening$root_dir/etc/os-release
which links to../../../../../etc/os-release
(or even to/etc/os-release
) would result in host' own OS-RELEASE file being read (and distro info inferred from there) - as a consequence, symbolic link loops are not detected (a kind of DoS could happen in some cases)
So we (@hartwork and I) worked on these issues without opting for solutions as requiring system privileges or using a third-party dependency. There are two "final" implementations :
-
The one proposed in this MR, which eventually addresses Sebastian's concerns, and which :
- Recursively resolves links relatively to
root_dir
until a non-link file is found (in a best-effort manner) ; - Assert this file is actually under
root_dir
before returning it ; - Any directory symlink encountered is not supported (they won't be "bent" in chroot).
- Recursively resolves links relatively to
-
Sebastian's (not rebased onto
master
, which manually resolves file paths relatively toroot_dir
(with full NT paths support ?)
From here, we see multiple options that I'd like to propose/discuss with you :
- Choose and keep one of these implementations, as is ;
- Implement one of these algorithms internally, as a separate module in the
distro
package, and fully-optional (try..except ModuleNotFoundError
for projects that vendordistro.py
script, disabling--root-dir
option when missing) ; - Implement this algorithm internally, as a separate module in the
distro
package, and required (import [...]
) ; - ~~Implement this algorithm externally (a third package that we [both of us] would maintain on PyPI, this would adding a [extra ?] dependency to this project)~~ ;
- Drop the
--root-dir
parameter completely, close this MR and release distro v2 ; - Use a feature/trick that we do not know about, safely resolving links relatively to a specified directory (and cross platform ?).
Thanks for your time, see you all :wave: