solidity icon indicating copy to clipboard operation
solidity copied to clipboard

splitSources.py can't handle invalid UTF-8 sequences

Open cameel opened this issue 3 years ago • 0 comments

Steps to Reproduce

scripts/splitSources.py test/libsolidity/syntaxTests/string/invalid_utf8_sequence.sol
Traceback (most recent call last):
  File "scripts/splitSources.py", line 51, in <module>
    lines = open(filePath, mode='r', encoding='utf8').read().splitlines()
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 36: invalid start byte

Description

The command above is executed by scripts/ASTImportTest.sh as a part of our command line tests. It's just preprocessing before the actual test.

It's still possible to split a multi-source file even if it contains invalid UTF-8 sequences so it's better to do so and let the compiler fail down the line instead of having two different layers where things can fail in different ways.

Environment

  • Compiler version: 0.7.1-develop.2020.8.31+commit.8c8eca3e.Linux.g++

cameel avatar Aug 31 '20 15:08 cameel