Overly restrictive permissions set on tmp/data folder due to python mkdtemp()
We recently began bagging on domain-joined Windows workstations as opposed to 'Forensic Workstations' which are off the network. We have noticed on Windows that the 'data' folder is only accessible to the user that created the bag. We are able to replicate this outside of bagit itself on a domain-joined PC with:
import os
import tempfile
cwd = os.getcwd()
temp_data = tempfile.mkdtemp(dir=cwd)
os.chmod(temp_data, os.stat(cwd).st_mode)
In advanced security settings, Inheritance is disabled. Once it is manually enabled, the permissions behave as expected. However, this issue can slip through the cracks as if the bag is transferred to another location, inheritance is enabled on the copy.
This behaviour is not present in a 'regular' windows account that isn't attached to a domain.
I believe that this relates specifically to how mkdtemp works, and I think the issue is somewhat articulated here: https://github.com/python/cpython/issues/86050
I have tested a pull request that just uses os.mkdir (pretty sure pathlib is preferred nowadays but it isn't used in the current bagit codebase) instead, and the permissions then behave correctly and inheritance is enabled by default. Again, as with https://github.com/LibraryOfCongress/bagit-python/issues/191 this is an environment issue so I understand that there might be reticence to add in this change. It would be great to know if anyone else has encountered it!
I will follow up with the pull request.
Best,
Kieran O'Leary Digital Repository Services Manager National Library of Ireland
Hmm, yes, I don't know that I want to make a global change which could potentially expose someone's data. What I'd like to test is whether you get the expected results if you call os.stat() on bag_dir and then use os.chmod(temp_data, parent_stat.st_mode) to set the same permissions on the data directory.
Absolutely, i don't think my PR is the best way to fix this. I'm not sure if I understood your testing scenario correctly, I added bits of code and print statements but it looks like the os.chmod() calls aren't changing anything as the stats are the same before and after... Let me know if this was the testing you meant or if you wish me to check something else.
# FIXME: if we calculate full paths we won't need to deal with changing directories
os.chdir(bag_dir)
print(os.stat(bag_dir))
os.stat_result(st_mode=16895, st_ino=4222124651434116, st_dev=631896474022735094, st_nlink=1, st_uid=0, st_gid=0, st_size=4096, st_atime=1755678154, st_mtime=1755678076, st_ctime=1732543273)
cwd = os.getcwd()
temp_data = tempfile.mkdtemp(dir=cwd)
print(os.stat(temp_data))
os.stat_result(st_mode=16895, st_ino=6473924464434896, st_dev=631896474022735094, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1755678154, st_mtime=1755678154, st_ctime=1755678154)
os.chmod(temp_data, os.stat(bag_dir).st_mode)
print(os.stat(temp_data))
os.stat_result(st_mode=16895, st_ino=6473924464434896, st_dev=631896474022735094, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1755678154, st_mtime=1755678154, st_ctime=1755678154)
@kieranjol I wonder if this small change alters the behavior on your domain-joined Windows machines?
https://github.com/LibraryOfCongress/bagit-python/compare/master...edsu:bagit:t193-perms
The use of mkdtemp() was really only introduced in order to avoid problems when there was already a file or directory named data in the directory to be bagged. So I guess I'm less concerned about security issues, although I guess there could be users of bagit that depend on the current undocumented behavior?