grammarinator
grammarinator copied to clipboard
Grammarinator crashes when generating sqlite test cases
I am trying to use grammarinator to generate test cases for sqlite
.
I am using the ANTLR grammar for sqlite
that is available at the official antlr repo:
First I run:
grammarinator-process examples/grammars/SQLiteLexer.g4 examples/grammars/SQLiteParser.g4 -o examples/fuzzer
Which works fine.
But when I run:
grammarinator-generate SQLiteGenerator.SQLiteGenerator -r sql_stmt -d 20 -o examples/tests/test_%d.sql -n 100 -s SQLiteGenerator.html_space_serializer --sys-path examples/fuzzer/
I often get the following error (Note that it does not always crash but 9/10 times it will):
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/generate.py", line 78, in create_test
return generator_tool.create(index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/tool/generator.py", line 255, in create
f.write(test)
File "<frozen codecs>", line 727, in write
File "<frozen codecs>", line 377, in write
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud85f' in position 878: surrogates not allowed
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ahmad/anaconda3/envs/grammarinator/bin/grammarinator-generate", line 8, in <module>
sys.exit(execute())
^^^^^^^^^
File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/generate.py", line 158, in execute
for _ in pool.imap_unordered(parallel_create_test, count(0) if args.n == inf else range(args.n)):
File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/multiprocessing/pool.py", line 873, in next
raise value
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud85f' in position 878: surrogates not allowed
From my understanding, this means that grammarinator is generating values that can't be encoded as a utf-8 string. Is this an issue with grammarinator or is there a way to handle this that I am not aware of?
Here are some environment details if needed:
$ pip show grammarinator
Name: grammarinator
Version: 23.7.post76+gf3ffa71.d20240427
Summary: Grammarinator: Grammar-based Random Test Generator
Home-page: https://github.com/renatahodovan/grammarinator
Author: Renata Hodovan, Akos Kiss
Author-email: [email protected], [email protected]
License: BSD
Location: /home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages
Requires: antlerinator, antlr4-python3-runtime, autopep8, inators, jinja2, regex
Required-by:
$ python -V
Python 3.12.3
The problem is that the grammar enables to generate surrogates as part of some tokens, however the test generator is not prepared to encode them while saving the output to file. To configure the encoding and the error handlers of encoding, you can use the --encoding
and the --encoding-errors
CLI options of grammarinator-generate
. These values will be passed to the encoding
and errors
parameter of codecs.open so you can set their values accordingly. In this case, I think the simples solution is to provide --encoding-errors=surrogatepass
argument to grammarinator-generate
.