run_syntax_tests fail if syntax test can't be loaded
It happens if sublime.load_resource fails in general, which may also happen if syntax_test_... files
- are too large (benchmarking files with 100k lines of code)
- do not use utf-8 encoding
Running Syntax Tests on a sublime-syntax file ends up in the following error on console without any test being executed.
Traceback (most recent call last):
File "C:\Apps\Sublime Text\Lib\python38\sublime_plugin.py", line 1473, in run_
return self.run(**args)
File "C:\Apps\Sublime Text\Packages\Default.sublime-package\run_syntax_tests.py", line 41, in run
File "C:\Apps\Sublime Text\Lib\python38\sublime.py", line 347, in load_resource
s = sublime_api.load_resource(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 142: invalid start byte
ST 4093 fixes an issue with empty test files (see #3760) only.
I vendored the original Default/run_syntax_tests.py and modified the is_syntax()... branch as follows:
if is_syntax(relative_path):
tests = []
header = re.compile('^.*SYNTAX TEST "(.*?)"')
for t in sublime.find_resources('syntax_test*'):
try:
first_line = sublime.load_resource(t).split('\n', 1)[0]
syntax = header.match(first_line).group(1)
if syntax == relative_path or syntax == file_name:
tests.append(t)
except:
continue
- Avoid compiling the regexp pattern multiple times by
re.compile()before walking through all the test files. - Make use of pythons
try...exceptto handle error situations to catch them all and avoid too many if..if.if.ifif... . That's the idea behind exceptions, isn't it? - Split the file into 2 pieces only instead of creating a list with propably thousands of lines. We are interested in the first one only.
are too large (benchmarking files with 100k lines of code)
I can't reproduce this. Duplicating the syntax_test_d.d from the default packages over 300k lines it's still able to run the tests.
Steps to reproduce the "large file" issue
Ok, must confess I was wrong with the numbers of lines. My failing test file has actually 1.2 million.
-
Create a new syntax_test_bench_arc.arc file with UTF-8 encoding and paste the following content 600k times
%_N_CYCLE0815_SPF ;$PATH=/_N_CMA_DIR -
Maybe or not need to install "CNC Sinumerik language support" package, which provides the syntax for ARC files. Anyway, the syntax is not applied to that file automtically. It is opened using Plain Text sinse ST 4090 or so.
-
Run syntax tests
It fails with
resource "Packages/CNC Sinumerik 840D SDK/bench/syntax_test_perf_arc.arc" not found
Notes:
- Someone may argue the number of lines to be insane, yes, but actually this is no file I want to run a normal syntax test for.
- It is insane to load the whole file in order to catch the first line only.
- Those benchmarking files need to be named
syntax_test_to be able to run performance benchmarks against, even though they are not meant to be ordinary test files.
Steps to reproduce the encoding issue
-
Create a new syntax_test_hmi.com file with Windows 1252 encoding and the following content:
; Änderung -
Open any sublime-syntax file
-
Run syntax tests
run_syntax_tests fails with UnicodeDecodeError at position 3 as load_resources() can't decode Ä as it assumes the file to be utf-8 encoded, which it is not.
Conclusion
There may be more edge cases causing issues. Hence wrapping the whole branch into a try except to gracefully catch them and continue with the next file is the most robust solution, IMHO, especially as we don't need to stop looking for test files just because on of them failed to load.
Maybe a sublime.load_recource(maxnumchars) may help to speed up syntax test lookup.
Created a proof of concept to
- include only files matching the syntax definion via
sublime.find_syntax_for_file(t) - reading the first 2k of data to match the first line from filesystem directly first.
if is_syntax(relative_path):
tests = []
header = re.compile('^.*SYNTAX TEST "(.*?)"')
data_path = os.path.dirname(sublime.packages_path())
for t in sublime.find_resources('syntax_test*'):
# ignore tests with unmatching syntax
if sublime.find_syntax_for_file(t).path != relative_path:
continue
try:
try:
with open(os.path.join(data_path, t), "r") as file:
first_line = file.readline(2048)
except FileNotFoundError:
first_line = sublime.load_resource(t).split('\n', 1)[0]
syntax = header.match(first_line).group(1)
if syntax == relative_path or syntax == file_name:
tests.append(t)
except:
continue
Running a syntax test for Java reduced from 8s to 9ms, just by not loading syntax tests of other languages and limiting the amount of loaded characters to match the first line to 2k.
Syntax test files are loaded and checked in binary mode as of ST4175, which fixes this issue.