oletools icon indicating copy to clipboard operation
oletools copied to clipboard

Multiple test fails on Python 3.9

Open sebix opened this issue 2 years ago • 12 comments

Affected tool: tests

Describe the bug Multiple tests are failing:

  • test_macros: AssertionError
  • test_empty_behaviour, test_rtf_behaviour, test_text_behaviour, test_xlm: AssertionError due to a correct syntax warning (is must not be used to compare values)

File/Malware sample to reproduce the bug none

How To Reproduce the bug

pytest-3.9 -v -k 'not test_rough_doctype and not test_encrypted and not test_crypt_return and not test_all'

Expected behavior tests succeed

Console output / Screenshots

[   23s] =================================== FAILURES ===================================
[   23s] __________________________ TestOleIDBasic.test_macros __________________________
[   23s] 
[   23s] self = <tests.oleid.test_basic.TestOleIDBasic testMethod=test_macros>
[   23s] 
[   23s]     def test_macros(self):
[   23s]         """Test indicator for macros."""
[   23s]         for filename, value_dict in self.oleids:
[   23s]             # TODO: we need a sample file with xlm macros
[   23s]             before_dot, suffix = splitext(filename)
[   23s]             if suffix == '.zip':
[   23s]                 suffix = splitext(before_dot)[1]
[   23s]             # print('Debugging: {1}, {2} for {0}'
[   23s]             #       .format(filename, value_dict['vba'], value_dict['xlm']))
[   23s]     
[   23s]             # xlm detection does not work in-memory (yet)
[   23s]             # --> xlm is "unknown" for excel files, except some encrypted files
[   23s]             self.assertIn(value_dict['xlm'], ('Unknown', 'No'))
[   23s]     
[   23s]             # "macro detection" in text files leads to interesting results:
[   23s]             if filename in ('ooxml/dde-in-excel2003.xml',    # not really
[   23s]                             'encrypted/autostart-encrypt-standardpassword.xls',
[   23s]                             'msodde/dde-in-csv.csv',     # "Windows" "calc.exe"
[   23s]                             'msodde/dde-in-excel2003.xml',   # same as above
[   23s]                             'oleform/oleform-PR314.docm',
[   23s]                             'basic/empty',                   # WTF?
[   23s]                             'basic/text'):                   # no macros!
[   23s]                 self.assertEqual(value_dict['vba'], 'Yes')
[   23s]             else:
[   23s] >               self.assertEqual(value_dict['vba'], 'No')
[   23s] E               AssertionError: 'Yes' != 'No'
[   23s] E               - Yes
[   23s] E               + No
[   23s] 
[   23s] tests/oleid/test_basic.py:126: AssertionError
[   23s] ------------------------------ Captured log call -------------------------------
[   23s] WARNING  olevba:_logger_adapter.py:32 For now, VBA stomping cannot be detected for files in memory
[   23s] WARNING  olevba:_logger_adapter.py:32 For now, VBA stomping cannot be detected for files in memory
[   23s] _____________________ TestOlevbaBasic.test_empty_behaviour _____________________
[   23s] 
[   23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_empty_behaviour>
[   23s] 
[   23s]     def test_empty_behaviour(self):
[   23s]         """Test behaviour of olevba when presented with pure text file."""
[   23s] >       self.do_test_behaviour('empty')
[   23s] 
[   23s] tests/olevba/test_basic.py:24: 
[   23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[   23s] tests/olevba/test_basic.py:45: in do_test_behaviour
[   23s]     raise self.fail('Found "warn" in output line: "{}"'
[   23s] E   AssertionError: Found "warn" in output line: "/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?"
[   23s] ______________________ TestOlevbaBasic.test_rtf_behaviour ______________________
[   23s] 
[   23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_rtf_behaviour>
[   23s] 
[   23s]     def test_rtf_behaviour(self):
[   23s]         """Test behaviour of olevba when presented with an rtf file."""
[   23s]         input_file = join(DATA_BASE_DIR, 'msodde', 'RTF-Spec-1.7.rtf')
[   23s]         output, ret_code = call_and_capture('olevba', args=(input_file, ),
[   23s]                                             accept_nonzero_exit=True)
[   23s]     
[   23s]         # check that return code is olevba.RETURN_OPEN_ERROR
[   23s]         self.assertEqual(ret_code, 5)
[   23s]     
[   23s]         # check output:
[   23s]         self.assertIn('FileOpenError', output)
[   23s]         self.assertIn('is RTF', output)
[   23s]         self.assertIn('rtfobj', output)
[   23s]         # TODO: I disabled this test because we do not log "not encrypted" as warning anymore
[   23s]         # to avoid other issues.
[   23s]         # If we really want to test this, then the test should be run with log level INFO:
[   23s]         # self.assertIn('not encrypted', output)
[   23s]     
[   23s]         # check warnings
[   23s]         for line in output.splitlines():
[   23s]             if line.startswith('WARNING ') and 'encrypted' in line:
[   23s]                 continue   # encryption warnings are ok
[   23s]             elif 'warn' in line.lower():
[   23s] >               raise self.fail('Found "warn" in output line: "{}"'
[   23s]                                 .format(line.rstrip()))
[   23s] E               AssertionError: Found "warn" in output line: "/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?"
[   23s] 
[   23s] tests/olevba/test_basic.py:75: AssertionError
[   23s] _____________________ TestOlevbaBasic.test_text_behaviour ______________________
[   23s] 
[   23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_text_behaviour>
[   23s] 
[   23s]     def test_text_behaviour(self):
[   23s]         """Test behaviour of olevba when presented with pure text file."""
[   23s] >       self.do_test_behaviour('text')
[   23s] 
[   23s] tests/olevba/test_basic.py:20: 
[   23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[   23s] tests/olevba/test_basic.py:45: in do_test_behaviour
[   23s]     raise self.fail('Found "warn" in output line: "{}"'
[   23s] E   AssertionError: Found "warn" in output line: "/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?"
[   23s] ___________________________ TestOlevbaBasic.test_xlm ___________________________
[   23s] 
[   23s] self = <tests.olevba.test_basic.TestOlevbaBasic testMethod=test_xlm>
[   23s] 
[   23s]     def test_xlm(self):
[   23s]         """Test that xlm macros are found."""
[   23s]         XLM_DIR = join(DATA_BASE_DIR, 'excel4-macros')
[   23s]         ADD_ARGS = ['-j']
[   23s]     
[   23s]         for filename in os.listdir(XLM_DIR):
[   23s]             full_name = join(XLM_DIR, filename)
[   23s]             suffix = splitext(filename)[1]
[   23s]             out_str, ret_code = call_and_capture('olevba',
[   23s]                                                  args=[full_name, ] + ADD_ARGS,
[   23s]                                                  accept_nonzero_exit=True)
[   23s] >           output = json.loads(out_str)
[   23s] 
[   23s] tests/olevba/test_basic.py:122: 
[   23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[   23s] /usr/lib64/python3.9/json/__init__.py:346: in loads
[   23s]     return _default_decoder.decode(s)
[   23s] /usr/lib64/python3.9/json/decoder.py:337: in decode
[   23s]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[   23s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[   23s] 
[   23s] self = <json.decoder.JSONDecoder object at 0x7fbf9fb24700>
[   23s] s = '/home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean ...ALERT(\\"This is a sample Excel 4 macro\\")\\n\' Formula or Macro: HALT()"\n              }\n          ]\n      }\n]\n'
[   23s] idx = 0
[   23s] 
[   23s]     def raw_decode(self, s, idx=0):
[   23s]         """Decode a JSON document from ``s`` (a ``str`` beginning with
[   23s]         a JSON document) and return a 2-tuple of the Python
[   23s]         representation and the index in ``s`` where the document ended.
[   23s]     
[   23s]         This can be used to decode a JSON document from a string that may
[   23s]         have extraneous data at the end.
[   23s]     
[   23s]         """
[   23s]         try:
[   23s]             obj, end = self.scan_once(s, idx)
[   23s]         except StopIteration as err:
[   23s] >           raise JSONDecodeError("Expecting value", s, err.value) from None
[   23s] E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[   23s] 
[   23s] /usr/lib64/python3.9/json/decoder.py:355: JSONDecodeError
[   23s] =============================== warnings summary ===============================
[   23s] oletools/rtfobj.py:271
[   23s]   /home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/rtfobj.py:271: DeprecationWarning: invalid escape sequence \d
[   23s]     DECIMAL_GROUP = b'(\d{1,250})'
[   23s] 
[   23s] ../../../../../usr/lib/python3.9/site-packages/colorclass/codes.py:5
[   23s]   /usr/lib/python3.9/site-packages/colorclass/codes.py:5: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
[   23s]     from collections import Mapping
[   23s] 
[   23s] oletools/oleobj.py:536
[   23s]   /home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:536: DeprecationWarning: invalid escape sequence \-
[   23s]     sane_fname = re.sub(u'[^a-zA-Z0-9.\-_ ]', replacement, basepath)
[   23s] 
[   23s] oletools/oleobj.py:580
[   23s]   /home/abuild/rpmbuild/BUILD/oletools-0.60.1/oletools/oleobj.py:580: SyntaxWarning: "is" with a literal. Did you mean "=="?
[   23s]     if idx is -1:
[   23s] 
[   23s] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
[   23s] =========================== short test summary info ============================
[   23s] FAILED tests/oleid/test_basic.py::TestOleIDBasic::test_macros - AssertionErro...
[   23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_empty_behaviour - As...
[   23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_rtf_behaviour - Asse...
[   23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_text_behaviour - Ass...
[   23s] FAILED tests/olevba/test_basic.py::TestOlevbaBasic::test_xlm - json.decoder.J...
[   23s] =========== 5 failed, 70 passed, 6 deselected, 4 warnings in 15.09s ============

Version information:

  • OS: openSUSE Tumbleweed
  • OS version: 20220520 -64 bits
  • Python version: 3.9 - 64 bits
  • oletools version: 0.60.1

Additional context

sebix avatar May 22 '22 10:05 sebix

working on it, have fixes for tests in a related branch https://github.com/christian-intra2net/oletools/tree/unittest-automation

(Actually, that only addresses some of the issues you mention. I did not see e.g. the first, how did you obtain this output?)

christian-intra2net avatar Jun 01 '22 10:06 christian-intra2net

I run these tests in a build environment, packaging it for openSUSE. The build env is neutral and isolated to prevent any interferences and guarantee reproducibility. Here is a complete log of the build/test: localbuild.log

Unfortunately I can't apply your branch as patch for testing purposes because tests/olevba/test_basic.py has windows line endings and the patch generated by GitHub does not. Having consistent LF line endings in the future would be awesome.

sebix avatar Jun 01 '22 11:06 sebix

Ah, pytest, ok. I only run the tests as "python -m unittest discover" or similar commands.

I do not understand where those windows line-endings come from. In my checkout of oletools there are none in tests/olevba/test_basic.py nor in olevba.py itself. I am also working on linux, so I do not think that my environment could introduce them.

In any case, I created an extra branch with just the 2 relevant commits and created a PR from this (PR #769), hope that simplifies things

christian-intra2net avatar Jun 03 '22 10:06 christian-intra2net

Ah, pytest, ok. I only run the tests as "python -m unittest discover" or similar commands.

Ok, I'll switch to the unittest runner instead.

I do not understand where those windows line-endings come from. In my checkout of oletools there are none in tests/olevba/test_basic.py nor in olevba.py itself. I am also working on linux, so I do not think that my environment could introduce them.

> file oletools-0.60.1/tests/olevba/test_basic.py
oletools-0.60.1/tests/olevba/test_basic.py: Python script, ASCII text executable, with CRLF line terminators

In any case, I created an extra branch with just the 2 relevant commits and created a PR from this (PR #769), hope that simplifies things

Thanks, I'll take a look.

sebix avatar Jun 03 '22 10:06 sebix

Ah, maybe the CRLF come from the shitty zip archive: https://pypi.org/project/oletools/#files (No .tar.gz sdist is provided there, which is the usual way for sdists)

sebix avatar Jun 03 '22 10:06 sebix

> file tests/olevba/test_basic.py
tests/olevba/test_basic.py: Python script, ASCII text executable

Guess you are right. I recommend cloning the repo and working with that.

christian-intra2net avatar Jun 03 '22 11:06 christian-intra2net

Cloning during the build is not a option as fetching data from the internet won't be reproducible, but can switch to the sources provided by GitHub.

sebix avatar Jun 03 '22 11:06 sebix

Let me know if your build fails again, happy to assist in distributing oletools :) (...but I am only available until 17:00, then on holidays for a week :) )

christian-intra2net avatar Jun 03 '22 11:06 christian-intra2net

(...but I am only available until 17:00, then on holidays for a week :) )

We're not in a hurry. I wish you pleasant holidays!

sebix avatar Jun 03 '22 13:06 sebix

I think I know where the CRLF line endings come from: I generate the PyPI archives on my Windows machine, from my local copy of the repository. On Windows, git automatically converts all line endings to CRLF when checking out/pulling, and converts them back to LF when pushing to GitHub. It's the default behaviour: https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings I could set autocrlf to false to force all files to use LF locally, but not sure about the side effects. I think I had some issues with it several years ago.

What is the actual problem with CRLF line endings? Python accepts both LF and CRLF, so I'd like to better understand the issue before changing the distribution process.

Moreover, I used to upload sdist both in zip and tar.gz formats to PyPI, but since they contain the same files I thought zip was enough as it's supported on all platforms. What is the specific need for a tar.gz sdist instead of zip?

decalage2 avatar Jun 03 '22 19:06 decalage2

What is the actual problem with CRLF line endings?

Python alone may accept LF and CRLF equally, but it gets interesting when the sdist is not just used for execution, but for either development, re-packaging and similar stuff. In my case, I had the sdist with CRLF and intended to apply a PR as patch downloaded from GitHub (with LF). That doesn't work. You don't need to change your process for me (or actually the packages that I create), I find ways to work around it. But I wanted to bring this issue to your attention.

Moreover, I used to upload sdist both in zip and tar.gz formats to PyPI, but since they contain the same files I thought zip was enough as it's supported on all platforms. What is the specific need for a tar.gz sdist instead of zip?

As you found out, the CRLF line endings come from git on Windows, not from zip. So .tar.gz files generated on Windows probably have the same issue.

sebix avatar Jun 04 '22 10:06 sebix

The five unit tests are still failing on current Python 3.9 versions.

sebix avatar Feb 02 '24 14:02 sebix