oletools
oletools copied to clipboard
Multiple test fails on Python 3.9
Affected tool: tests
Describe the bug Multiple tests are failing:
-
test_macros
:AssertionError
-
test_empty_behaviour
,test_rtf_behaviour
,test_text_behaviour
,test_xlm
: AssertionError due to a correct syntax warning (is
must not be used to compare values)
File/Malware sample to reproduce the bug none
How To Reproduce the bug
pytest-3.9 -v -k 'not test_rough_doctype and not test_encrypted and not test_crypt_return and not test_all'
Expected behavior tests succeed
Console output / Screenshots
[ 23s] =================================== FAILURES ===================================
[ 23s] __________________________ TestOleIDBasic.test_macros __________________________
[ 23s]
[ 23s] self = <tests.oleid.test_basic.TestOleIDBasic testMethod=test_macros>
[ 23s]
[ 23s] def test_macros(self):
[ 23s] """Test indicator for macros."""
[ 23s] for filename, value_dict in self.oleids:
[ 23s] # TODO: we need a sample file with xlm macros
[ 23s] before_dot, suffix = splitext(filename)
[ 23s] if suffix == '.zip':
[ 23s] suffix = splitext(before_dot)[1]
[ 23s] # print('Debugging: {1}, {2} for {0}'
[ 23s] # .format(filename, value_dict['vba'], value_dict['xlm']))
[ 23s]
[ 23s] # xlm detection does not work in-memory (yet)
[ 23s] # --> xlm is "unknown" for excel files, except some encrypted files
[ 23s] self.assertIn(value_dict['xlm'], ('Unknown', 'No'))
[ 23s]
[ 23s] # "macro detection" in text files leads to interesting results:
[ 23s] if filename in ('ooxml/dde-in-excel2003.xml', # not really
[ 23s] 'encrypted/autostart-encrypt-standardpassword.xls',
[ 23s] 'msodde/dde-in-csv.csv', # "Windows" "calc.exe"
[ 23s] 'msodde/dde-in-excel2003.xml', # same as above
[ 23s] 'oleform/oleform-PR314.docm',
[ 23s] 'basic/empty', # WTF?
[ 23s] 'basic/text'): # no macros!
[ 23s] self.assertEqual(value_dict['vba'], 'Yes')
[ 23s] else:
[ 23s] > self.assertEqual(value_dict['vba'], 'No')
[ 23s] E AssertionError: 'Yes' != 'No'
[ 23s] E - Yes
[ 23s] E + No
[ 23s]
[ 23s] tests/oleid/test_basic.py:126: AssertionError
[ 23s] ------------------------------ Captured log call -------------------------------
[ 23s] WARNING olevba:_logger_adapter.py:32 For now, VBA stomping cannot be detected for files in memory
[ 23s] WARNING olevba:_logger_adapter.py:32 For now, VBA stomping cannot be detected for files in memory
[ 23s] _____________________ TestOlevbaBasic.test_empty_behaviour _____________________
[ 23s]
[ 23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_empty_behaviour>
[ 23s]
[ 23s] def test_empty_behaviour(self):
[ 23s] """Test behaviour of olevba when presented with pure text file."""
[ 23s] > self.do_test_behaviour('empty')
[ 23s]
[ 23s] tests/olevba/test_basic.py:24:
[ 23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 23s] tests/olevba/test_basic.py:45: in do_test_behaviour
[ 23s] raise self.fail('Found "warn" in output line: "{}"'
[ 23s] E AssertionError: Found "warn" in output line: "/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?"
[ 23s] ______________________ TestOlevbaBasic.test_rtf_behaviour ______________________
[ 23s]
[ 23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_rtf_behaviour>
[ 23s]
[ 23s] def test_rtf_behaviour(self):
[ 23s] """Test behaviour of olevba when presented with an rtf file."""
[ 23s] input_file = join(DATA_BASE_DIR, 'msodde', 'RTF-Spec-1.7.rtf')
[ 23s] output, ret_code = call_and_capture('olevba', args=(input_file, ),
[ 23s] accept_nonzero_exit=True)
[ 23s]
[ 23s] # check that return code is olevba.RETURN_OPEN_ERROR
[ 23s] self.assertEqual(ret_code, 5)
[ 23s]
[ 23s] # check output:
[ 23s] self.assertIn('FileOpenError', output)
[ 23s] self.assertIn('is RTF', output)
[ 23s] self.assertIn('rtfobj', output)
[ 23s] # TODO: I disabled this test because we do not log "not encrypted" as warning anymore
[ 23s] # to avoid other issues.
[ 23s] # If we really want to test this, then the test should be run with log level INFO:
[ 23s] # self.assertIn('not encrypted', output)
[ 23s]
[ 23s] # check warnings
[ 23s] for line in output.splitlines():
[ 23s] if line.startswith('WARNING ') and 'encrypted' in line:
[ 23s] continue # encryption warnings are ok
[ 23s] elif 'warn' in line.lower():
[ 23s] > raise self.fail('Found "warn" in output line: "{}"'
[ 23s] .format(line.rstrip()))
[ 23s] E AssertionError: Found "warn" in output line: "/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?"
[ 23s]
[ 23s] tests/olevba/test_basic.py:75: AssertionError
[ 23s] _____________________ TestOlevbaBasic.test_text_behaviour ______________________
[ 23s]
[ 23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_text_behaviour>
[ 23s]
[ 23s] def test_text_behaviour(self):
[ 23s] """Test behaviour of olevba when presented with pure text file."""
[ 23s] > self.do_test_behaviour('text')
[ 23s]
[ 23s] tests/olevba/test_basic.py:20:
[ 23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 23s] tests/olevba/test_basic.py:45: in do_test_behaviour
[ 23s] raise self.fail('Found "warn" in output line: "{}"'
[ 23s] E AssertionError: Found "warn" in output line: "/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?"
[ 23s] ___________________________ TestOlevbaBasic.test_xlm ___________________________
[ 23s]
[ 23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_xlm>
[ 23s]
[ 23s] def test_xlm(self):
[ 23s] """Test that xlm macros are found."""
[ 23s] XLM_DIR = join(DATA_BASE_DIR, 'excel4-macros')
[ 23s] ADD_ARGS = ['-j']
[ 23s]
[ 23s] for filename in os.listdir(XLM_DIR):
[ 23s] full_name = join(XLM_DIR, filename)
[ 23s] suffix = splitext(filename)[1]
[ 23s] out_str, ret_code = call_and_capture('olevba',
[ 23s] args=[full_name, ] + ADD_ARGS,
[ 23s] accept_nonzero_exit=True)
[ 23s] > output = json.loads(out_str)
[ 23s]
[ 23s] tests/olevba/test_basic.py:122:
[ 23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 23s] /usr/lib64/python3.9/json/__init__.py:346: in loads
[ 23s] return _default_decoder.decode(s)
[ 23s] /usr/lib64/python3.9/json/decoder.py:337: in decode
[ 23s] obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[ 23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 23s]
[ 23s] self = <json.decoder.JSONDecoder object at 0x7fbf9fb24700>
[ 23s] s = '/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean ...ALERT(\\"This is a sample Excel 4 macro\\")\\n\' Formula or Macro: HALT()"\n }\n ]\n }\n]\n'
[ 23s] idx = 0
[ 23s]
[ 23s] def raw_decode(self, s, idx=0):
[ 23s] """Decode a JSON document from ``s`` (a ``str`` beginning with
[ 23s] a JSON document) and return a 2-tuple of the Python
[ 23s] representation and the index in ``s`` where the document ended.
[ 23s]
[ 23s] This can be used to decode a JSON document from a string that may
[ 23s] have extraneous data at the end.
[ 23s]
[ 23s] """
[ 23s] try:
[ 23s] obj, end = self.scan_once(s, idx)
[ 23s] except StopIteration as err:
[ 23s] > raise JSONDecodeError("Expecting value", s, err.value) from None
[ 23s] E json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[ 23s]
[ 23s] /usr/lib64/python3.9/json/decoder.py:355: JSONDecodeError
[ 23s] =============================== warnings summary ===============================
[ 23s] oletools/rtfobj.py:271
[ 23s] /home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/rtfobj.py:271: DeprecationWarning: invalid escape sequence \d
[ 23s] DECIMAL_GROUP = b'(\d{1,250})'
[ 23s]
[ 23s] ../../../../../usr/lib/python3.9/site-packages/colorclass/codes.py:5
[ 23s] /usr/lib/python3.9/site-packages/colorclass/codes.py:5: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
[ 23s] from collections import Mapping
[ 23s]
[ 23s] oletools/oleobj.py:536
[ 23s] /home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:536: DeprecationWarning: invalid escape sequence \-
[ 23s] sane_fname = re.sub(u'[^a-zA-Z0-9.\-_ ]', replacement, basepath)
[ 23s]
[ 23s] oletools/oleobj.py:580
[ 23s] /home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?
[ 23s] if idx is -1:
[ 23s]
[ 23s] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
[ 23s] =========================== short test summary info ============================
[ 23s] FAILED tests/oleid/test_basic.py::TestOleIDBasic::test_macros - AssertionErro...
[ 23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_empty_behaviour - As...
[ 23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_rtf_behaviour - Asse...
[ 23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_text_behaviour - Ass...
[ 23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_xlm - json.decoder.J...
[ 23s] =========== 5 failed, 70 passed, 6 deselected, 4 warnings in 15.09s ============
Version information:
- OS: openSUSE Tumbleweed
- OS version: 20220520 -64 bits
- Python version: 3.9 - 64 bits
- oletools version: 0.60.1
Additional context
working on it, have fixes for tests in a related branch https://github.com/christian-intra2net/oletools/tree/unittest-automation
(Actually, that only addresses some of the issues you mention. I did not see e.g. the first, how did you obtain this output?)
I run these tests in a build environment, packaging it for openSUSE. The build env is neutral and isolated to prevent any interferences and guarantee reproducibility. Here is a complete log of the build/test: localbuild.log
Unfortunately I can't apply your branch as patch for testing purposes because tests/olevba/test_basic.py
has windows line endings and the patch generated by GitHub does not. Having consistent LF line endings in the future would be awesome.
Ah, pytest, ok. I only run the tests as "python -m unittest discover" or similar commands.
I do not understand where those windows line-endings come from. In my checkout of oletools there are none in tests/olevba/test_basic.py nor in olevba.py itself. I am also working on linux, so I do not think that my environment could introduce them.
In any case, I created an extra branch with just the 2 relevant commits and created a PR from this (PR #769), hope that simplifies things
Ah, pytest, ok. I only run the tests as "python -m unittest discover" or similar commands.
Ok, I'll switch to the unittest runner instead.
I do not understand where those windows line-endings come from. In my checkout of oletools there are none in tests/olevba/test_basic.py nor in olevba.py itself. I am also working on linux, so I do not think that my environment could introduce them.
> file oletools-0.60.1/tests/olevba/test_basic.py
oletools-0.60.1/tests/olevba/test_basic.py: Python script, ASCII text executable, with CRLF line terminators
In any case, I created an extra branch with just the 2 relevant commits and created a PR from this (PR #769), hope that simplifies things
Thanks, I'll take a look.
Ah, maybe the CRLF come from the shitty zip archive: https://pypi.org/project/oletools/#files (No .tar.gz sdist is provided there, which is the usual way for sdists)
> file tests/olevba/test_basic.py
tests/olevba/test_basic.py: Python script, ASCII text executable
Guess you are right. I recommend cloning the repo and working with that.
Cloning during the build is not a option as fetching data from the internet won't be reproducible, but can switch to the sources provided by GitHub.
Let me know if your build fails again, happy to assist in distributing oletools :) (...but I am only available until 17:00, then on holidays for a week :) )
(...but I am only available until 17:00, then on holidays for a week :) )
We're not in a hurry. I wish you pleasant holidays!
I think I know where the CRLF line endings come from: I generate the PyPI archives on my Windows machine, from my local copy of the repository. On Windows, git automatically converts all line endings to CRLF when checking out/pulling, and converts them back to LF when pushing to GitHub. It's the default behaviour: https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings I could set autocrlf to false to force all files to use LF locally, but not sure about the side effects. I think I had some issues with it several years ago.
What is the actual problem with CRLF line endings? Python accepts both LF and CRLF, so I'd like to better understand the issue before changing the distribution process.
Moreover, I used to upload sdist both in zip and tar.gz formats to PyPI, but since they contain the same files I thought zip was enough as it's supported on all platforms. What is the specific need for a tar.gz sdist instead of zip?
What is the actual problem with CRLF line endings?
Python alone may accept LF and CRLF equally, but it gets interesting when the sdist is not just used for execution, but for either development, re-packaging and similar stuff. In my case, I had the sdist with CRLF and intended to apply a PR as patch downloaded from GitHub (with LF). That doesn't work. You don't need to change your process for me (or actually the packages that I create), I find ways to work around it. But I wanted to bring this issue to your attention.
Moreover, I used to upload sdist both in zip and tar.gz formats to PyPI, but since they contain the same files I thought zip was enough as it's supported on all platforms. What is the specific need for a tar.gz sdist instead of zip?
As you found out, the CRLF line endings come from git on Windows, not from zip. So .tar.gz files generated on Windows probably have the same issue.
The five unit tests are still failing on current Python 3.9 versions.