pysaml2 icon indicating copy to clipboard operation
pysaml2 copied to clipboard

Fixing temporary file creation for Windows

Open theunraveler opened this issue 5 years ago • 17 comments

We were having issues with the temporary files created by this library on Windows not being readable by the xmlsec process. I don't totally understand the problem, but it seems like IIS was retaining some kind of lock on the created files, so the xmlsec process couldn't open them for reading. This resulted in 2 different errors:

  • First, xmlsec --verify failed because it could not read the temporary PEM file that was created for the operation
  • After that was fixed, the xmlsec --verify hung indefinitely because it could not access the file created for --output to go into.

I also see that NamedTemporaryFile has some odd behavior on Windows as noted in the Python documentation: https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile

This PR changes the way tempfiles are written on Windows. It still writes them to the temp directory as determined by tempfile.gettempdir(), but just uses regular open instead of NamedTemporaryFile. uuid4() is used to generate a unique-ish filename. This fix seems to work well on our Windows server and locally on Windows machines. I'm not sure if this is the best way to do it though---open to other implementations if you don't like this one.

Thanks!

theunraveler avatar Feb 05 '20 18:02 theunraveler

Codecov Report

Merging #665 into master will increase coverage by <.01%. The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #665      +/-   ##
==========================================
+ Coverage   65.05%   65.06%   +<.01%     
==========================================
  Files         102      102              
  Lines       25670    25676       +6     
==========================================
+ Hits        16700    16705       +5     
- Misses       8970     8971       +1
Impacted Files Coverage Δ
src/saml2/sigver.py 70.32% <88.88%> (+0.08%) :arrow_up:
src/saml2/validate.py 73.64% <0%> (ø) :arrow_up:
src/saml2/request.py 70.71% <0%> (ø) :arrow_up:
src/saml2/s2repoze/plugins/sp.py 3.17% <0%> (ø) :arrow_up:
src/saml2/mdstore.py 70.92% <0%> (ø) :arrow_up:
src/saml2/config.py 84.09% <0%> (ø) :arrow_up:
src/saml2/assertion.py 86.8% <0%> (ø) :arrow_up:
src/saml2/response.py 74.62% <0%> (ø) :arrow_up:
src/saml2/entity.py 85.13% <0%> (ø) :arrow_up:
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4e759a4...b489b43. Read the comment docs.

codecov[bot] avatar Feb 05 '20 19:02 codecov[bot]

Oh also, the method for creating tempfiles on Windows does not do anything with the delete_tmpfiles flag, but my though was that it's probably OK to just leave them and let IIS clean them up, no?

theunraveler avatar Feb 05 '20 19:02 theunraveler

You can even run a schedular to delete .xml files older than 10minutes

peppelinux avatar Feb 05 '20 19:02 peppelinux

the delete_tmpfiles is simply a flag that is passed to Python's NamedTemporaryFile function in order to trigger the automatic deletion. In your case it seems like NamedTemporaryFile is not working at all the way it should be.

You mentioned two errors:

First, xmlsec --verify failed because it could not read the temporary PEM file that was created for the operation After that was fixed, the xmlsec --verify hung indefinitely because it could not access the file created for --output to go into.

Can you identify what was that fixed the first error and led to the second error? The first issue sounds like file permissions and the second sound like a strange race condition with file locks.

On the diff per se, using uuid may sound like a fail proof solution but there is no mechanism to control what will happen if the uuid named file already exists. And this is even more possible since there is no automatic deletion mechanism so the files on the TEMP folder will be deleted when the server restarts (or some cron job deletion script runs).

Both of the above two scenarios are handled by NamedTemporaryFile. But there seems to be an issue with Windows NT and later versions that probably affects us mentioned in Python's docs:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

This might probably affect us since the file is opened, written, and then passed around. It seems from reading this issue that the real problem is that when the temp files are created, because the system is on windows the O_TEMPORARY flag must be set. This is mentioned in the issue and you can also see it in the source code of the NamedTemporaryFile:

# Setting O_TEMPORARY in the flags causes the OS to delete
# the file when it is closed.  This is only supported by Windows.
if _os.name == 'nt' and delete:
    flags |= _os.O_TEMPORARY

So I would suggest if possible to test if the issue still exists with the O_TEMPORARY flag set on Windows and modifying the _run_xmlsec function here to also include this flag.

If this works what we have to do is update the documentation and set the _run_xmlsec function when using Popen to have this env flag set to whatever the system has it set. If this doesn't work, I suggest we use tempfile.mkstemp (if it works) instead of creating files with uuid. This would solve the name clashing possibility but we will still have to invent a way to delete these files. So my hopes are that by setting the O_TEMPORARY flag we could avoid intricate OS dependent mechanisms.

ioparaskev avatar Feb 06 '20 09:02 ioparaskev

Thanks for doing the research and the detailed response.

I don't see a way to pass the O_TEMPORARY flag anywhere in the code (I think you're suggesting to pass it to Popen()? I don't think that will work, and NamedTemporaryFile does not provide a way to pass flags).

If we use mkstemp, do we really need to delete the tempfiles? Most OSes have some mechanism for deleting tempfiles on a scheduled basis, right? It just seems really difficult to do with the way that open file objects are passed around and read from in the code. I guess I could write a context manager class that does this though?

theunraveler avatar Feb 06 '20 16:02 theunraveler

Can you identify what was that fixed the first error and led to the second error? The first issue sounds like file permissions and the second sound like a strange race condition with file locks.

The first error was fixed by removing NamedTemporaryFile from make_temp, which was used to make the temporary PEM and XML files. The second was fixed by using the new _make_temp function (the same mechanism) in _run_xmlsec, which was used to make the file that xmlsec uses for --output.

theunraveler avatar Feb 06 '20 16:02 theunraveler

Can you identify what was that fixed the first error and led to the second error? The first issue sounds like file permissions and the second sound like a strange race condition with file locks.

The first error was fixed by removing NamedTemporaryFile from make_temp, which was used to make the temporary PEM and XML files. The second was fixed by using the new _make_temp function (the same mechanism) in _run_xmlsec, which was used to make the file that xmlsec uses for --output.

When you say "removing NamedTemporaryFile" you mean that you replaced it with some other file creation function? Now I'm a bit confused as to what the first issue was about exactly. Was it that the file got automatically deleted (if so did you try setting the delete_tmpfiles option to False to see if this error would still exist?) or was it because Windows held a lock on the file and didn't allow it to be opened from another app (but if that was the case it would mean that your solution would also not work, right?)

Thanks for doing the research and the detailed response.

I don't see a way to pass the O_TEMPORARY flag anywhere in the code (I think you're suggesting to pass it to Popen()? I don't think that will work, and NamedTemporaryFile does not provide a way to pass flags).

Pass it to Popen by writing Popen(com_list, stderr=PIPE, stdout=PIPE,env={'O_TEMPORARY': 1}). It should also be set to the environment that is running/using pysaml

If we use mkstemp, do we really need to delete the tempfiles? Most OSes have some mechanism for deleting tempfiles on a scheduled basis, right? It just seems really difficult to do with the way that open file objects are passed around and read from in the code. I guess I could write a context manager class that does this though?

Since we provide a way to *nix systems to automatically delete these temp files (and don't rely to some third party script/cron job to do it) for the sake of consistency I think we should also provide a way to do it in Windows as well

ioparaskev avatar Feb 07 '20 09:02 ioparaskev

When you say "removing NamedTemporaryFile" you mean that you replaced it with some other file creation function?

This does not function as expected in Windows and this issue is referenced in the python documentation.

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later). https://docs.python.org/3.8/library/tempfile.html

Essentially, we could not open this file again on our windows environment, which meant that xmlsec was unable to use the file for verifying the signature.

natehawkboss avatar Feb 25 '20 20:02 natehawkboss

This is very nasty on Python's part. I am really sad for this state of things.


The option being discussed, is to implement "NamedTemporaryFiles" some other way. To be complete, this should support deleting/cleaning-up those files, when closed. Is there something like that out there? What do other projects do?


Another option is to stop using files and use actual bindings to the xmlsec lib. I would prefer we invest time in this option.

There is already a wrapper for xmlsec: https://github.com/mehcode/python-xmlsec We must make sure the API coverage is enough to cover our use-cases. One thing that I notice is that it seems that it only works along lxml; this is fine with me - as part of the task to refactor all xml operations I had though of only supporting lxml, as it seems to provide way better tools to handle xml, than the builtin parsers.

Another lib we can look into is signxml: https://github.com/XML-Security/signxml

Would someone like to see how we can use this to sign/verify/encrypt/decrypt an xml document?

c00kiemon5ter avatar Feb 28 '20 20:02 c00kiemon5ter

I've encountered this issue as well, it seems to be entirely a Windows problem in so much that temporary files created with the O_TEMPORARY flag seem to be only accessible by the creating process and any subprocesses (i.e. xmlsec) are denied access. I ended up solving it by monkey patching tempfile._TemporaryFileCloser and tempfile.NamedTemporaryFile so that O_TEMPORARY was never set and that closing the file would manually delete it, instead.

Naturally, I wouldn't recommend that solution to everyone, though.

chanderso avatar Jul 24 '20 20:07 chanderso

I've encountered this issue as well, it seems to be entirely a Windows problem in so much that temporary files created with the O_TEMPORARY flag seem to be only accessible by the creating process and any subprocesses (i.e. xmlsec) are denied access. I ended up solving it by monkey patching tempfile._TemporaryFileCloser and tempfile.NamedTemporaryFile so that O_TEMPORARY was never set and that closing the file would manually delete it, instead.

Naturally, I wouldn't recommend that solution to everyone, though.

thank you for the feedback since I didn't have a windows vm to try this solution out. As it seems, the fact that we use a subprocess call to xmlsec beats the windows flag workaround. So we'll have to find some other way to do this -not monkeypatching internal libraries of course but something different in terms of handling these files. I'll have a look and see what else we can do

ioparaskev avatar Sep 01 '20 13:09 ioparaskev

Having this same issue on windows, any ETA when this will be merged?

talebi1 avatar Sep 09 '20 09:09 talebi1

Having this same issue on windows, any ETA when this will be merged?

Not yet to be honest. This needs some discussion on how to handle and when to free up these files. We should come up (if possible) with a common solution for all OS instead of adding flags to check if it's windows or some other *nix OS

ioparaskev avatar Nov 03 '20 13:11 ioparaskev

i ran into this problem. NamedTemporaryFile should never be used on windows or linux. it is fully bugged and can even result in an an infinite loop. uuid mechanism is the best mechanism. check out: https://github.com/python/cpython/issues/66305 and https://stackoverflow.com/a/58955530/627042

earonesty avatar Jan 25 '23 15:01 earonesty

hi @earonesty, and thanks for commenting back on this. I think using uuid is a viable way forward. The code needs to be reworked to clean up the files. But your gist is also interesting to hide the inner details behind a familiar interface.

The alternative choice would be to start using the xmlsec bindings and avoid creating files altogether, but that would go into a separate backend.

c00kiemon5ter avatar Jan 31 '23 10:01 c00kiemon5ter

python 3.12 finally has a better fix for this one;

  • https://github.com/python/cpython/issues/58451
  • https://github.com/python/cpython/pull/97015

c00kiemon5ter avatar Jun 08 '23 21:06 c00kiemon5ter