simplecpp
simplecpp copied to clipboard
#line directive results in exponential memory usage depending on its value
I noticed that that simplecpp results in exponential memory allocation and processing time behavior depending on the value of #line directive. For example this piece of code allocates more than gigabyte of memory:
#line 333333333333
a
Evaluating it within memory limited environment results in std::bad_alloc error:
$ (ulimit -v 1000000; time simplecpp input.c | dd bs=1024 count=1024)
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
0+0 records in
0+0 records out
0 bytes (0 B) copied, 33.9387 s, 0.0 kB/s
real 0m33.941s
user 0m32.936s
sys 0m0.820s
I'm not sure if you want to do anything for this but I thought that it could be a good idea to inform about this behavior. This likely requires optimizing the data structures that #line directive eventually results in. Also you might considering what limits what #line number value can have as the C99 standard gives you the possibility to limit it to 2147483647 (section 6.10.4 Line control).
For comparison GNU cpp results in following warning and overflowing output modulo 2^32 after the #line directive:
input.c:1:7: warning: line number out of range [enabled by default]
#line 333333333333
^
# 2620851541 "input.c"
a
This happens with revision d1c995c03515d289c7aa7246a74d666fd012c4eb
@Barro Did you perform an explicit test or did you run into this with a real-world example?
Does it matter if you expose a service using simplecpp as its processing front-end to the world wide web where anyone can submit this type of data?
I interpret that reply as this was not taken from a real code example. Imho it does matter. A crash that affects ordinary users analyzing their real code is worse.
It is possible to make cppcheck crash with garbage input, that can be seen with fuzzing. We are fixing such crashes but well we also want to do other things. Our fixes sometimes does cause problems for real users, so I think we can focus on this "too much".
About the web server: I would like that we actively try to prevent misuse. As far as I remember the input code is not allowed to be larger than 1000 bytes. I think that limiting the code size after preprocessing also would make sense. Not sure how to do that in a clean way - ideally we would just tweak the democlient code a little. I would also like to have some "aggressive" ulimit. The democlient is only intended for small samples not for real code.
We could tweak the simplecpp
output a little. it could output #line
directives when there linenumbers "jumps" more than 1000 lines or something. But well I think of this tool as a demo tool to test out the library. The textual output is not 100% perfect and that is fine.
removed the bug
label now.. because this is not a bug in simplecpp
library as far as I see. It's the debug output that overflows.