sublime_text run_syntax_tests fail if syntax test can't be loaded

It happens if sublime.load_resource fails in general, which may also happen if syntax_test_... files

are too large (benchmarking files with 100k lines of code)
do not use utf-8 encoding

Running Syntax Tests on a sublime-syntax file ends up in the following error on console without any test being executed.

Traceback (most recent call last):
  File "C:\Apps\Sublime Text\Lib\python38\sublime_plugin.py", line 1473, in run_
    return self.run(**args)
  File "C:\Apps\Sublime Text\Packages\Default.sublime-package\run_syntax_tests.py", line 41, in run
  File "C:\Apps\Sublime Text\Lib\python38\sublime.py", line 347, in load_resource
    s = sublime_api.load_resource(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 142: invalid start byte

ST 4093 fixes an issue with empty test files (see #3760) only.

I vendored the original Default/run_syntax_tests.py and modified the is_syntax()... branch as follows:

            if is_syntax(relative_path):
                tests = []
                header = re.compile('^.*SYNTAX TEST "(.*?)"')
                for t in sublime.find_resources('syntax_test*'):
                    try:
                        first_line = sublime.load_resource(t).split('\n', 1)[0]
                        syntax = header.match(first_line).group(1)
                        if syntax == relative_path or syntax == file_name:
                            tests.append(t)
                    except:
                        continue

Avoid compiling the regexp pattern multiple times by re.compile() before walking through all the test files.
Make use of pythons try...except to handle error situations to catch them all and avoid too many if..if.if.ifif... . That's the idea behind exceptions, isn't it?
Split the file into 2 pieces only instead of creating a list with propably thousands of lines. We are interested in the first one only.

Nov 18 '20 15:11 deathaxe

are too large (benchmarking files with 100k lines of code)

I can't reproduce this. Duplicating the syntax_test_d.d from the default packages over 300k lines it's still able to run the tests.

Nov 20 '20 04:11 BenjaminSchaaf

Steps to reproduce the "large file" issue

Ok, must confess I was wrong with the numbers of lines. My failing test file has actually 1.2 million.

Create a new syntax_test_bench_arc.arc file with UTF-8 encoding and paste the following content 600k times
```
%_N_CYCLE0815_SPF
;$PATH=/_N_CMA_DIR
```
Maybe or not need to install "CNC Sinumerik language support" package, which provides the syntax for ARC files. Anyway, the syntax is not applied to that file automtically. It is opened using Plain Text sinse ST 4090 or so.
Run syntax tests

It fails with resource "Packages/CNC Sinumerik 840D SDK/bench/syntax_test_perf_arc.arc" not found

Notes:

Someone may argue the number of lines to be insane, yes, but actually this is no file I want to run a normal syntax test for.
It is insane to load the whole file in order to catch the first line only.
Those benchmarking files need to be named syntax_test_ to be able to run performance benchmarks against, even though they are not meant to be ordinary test files.

Steps to reproduce the encoding issue

Create a new syntax_test_hmi.com file with Windows 1252 encoding and the following content:
```
; Änderung
```
Open any sublime-syntax file
Run syntax tests

run_syntax_tests fails with UnicodeDecodeError at position 3 as load_resources() can't decode Ä as it assumes the file to be utf-8 encoded, which it is not.

Conclusion

There may be more edge cases causing issues. Hence wrapping the whole branch into a try except to gracefully catch them and continue with the next file is the most robust solution, IMHO, especially as we don't need to stop looking for test files just because on of them failed to load.

Nov 20 '20 19:11 deathaxe

Maybe a sublime.load_recource(maxnumchars) may help to speed up syntax test lookup.

Created a proof of concept to

include only files matching the syntax definion via sublime.find_syntax_for_file(t)
reading the first 2k of data to match the first line from filesystem directly first.

            if is_syntax(relative_path):
                tests = []
                header = re.compile('^.*SYNTAX TEST "(.*?)"')
                data_path = os.path.dirname(sublime.packages_path())
                for t in sublime.find_resources('syntax_test*'):
                    # ignore tests with unmatching syntax
                    if sublime.find_syntax_for_file(t).path != relative_path:
                        continue
                    try:
                        try:
                            with open(os.path.join(data_path, t), "r") as file:
                                first_line = file.readline(2048)
                        except FileNotFoundError:
                            first_line = sublime.load_resource(t).split('\n', 1)[0]
                        syntax = header.match(first_line).group(1)
                        if syntax == relative_path or syntax == file_name:
                            tests.append(t)
                    except:
                        continue

Running a syntax test for Java reduced from 8s to 9ms, just by not loading syntax tests of other languages and limiting the amount of loaded characters to match the first line to 2k.

Nov 22 '20 17:11 deathaxe

Syntax test files are loaded and checked in binary mode as of ST4175, which fixes this issue.

Jul 05 '24 16:07 deathaxe