Fuzzing implementation that doesn't require libraries
I propose a function that generates syntactically correct strings, to test the interpreter:
enum Simbol {
SIGMA, DECLARATION, VAR_NAME, NUMBER,
assign, end,
letterA, letterB,
number1, number2
};
const std::set<Simbol> terminalSimbols {
assign, end, letterA, letterB, number1, number2
};
const std::unordered_map<Simbol, std::string> finalReplacement {
{assign, " := " },
{end, ";" },
{letterA, "a"},
{letterB, "b"},
{number1, "1"},
{number2, "2"},
};
const std::unordered_map<Simbol, std::vector<std::vector<Simbol>>> intermidiateReplacement {
{SIGMA, { { DECLARATION } } },
{DECLARATION, { { end }, { VAR_NAME, assign, NUMBER, end }, { DECLARATION, DECLARATION } } },
{VAR_NAME, { { letterA }, { letterB }, { VAR_NAME, VAR_NAME } } },
{NUMBER, { { number1} , { number2 }, { NUMBER, NUMBER } } }
};
const std::unordered_map<Simbol, std::discrete_distribution<int>> intermidiateReplacementDist {
{SIGMA, { 1 } },
{DECLARATION, { 0.5, 1, 0.5} },
{VAR_NAME, { 1, 1, 1 } },
{NUMBER, { 1, 1, 0.5 } }
};
std::list<Simbol> generate() {
std::random_device rd;
std::mt19937 gen(rd());
std::list<Simbol> list;
list.push_back(SIGMA);
int terminalsInList = 0;
auto i = list.begin();
while (terminalsInList < list.size()) {
if(i == list.begin()) {
terminalsInList = 0;
}
bool terminal = terminalSimbols.find(*i) != terminalSimbols.end();
if (!terminal) {
auto replacementDistribution = intermidiateReplacementDist.at(*i);
std::vector<std::vector<Simbol>> currentReplacementPossibilities = intermidiateReplacement.at(*i);
assert(currentReplacementPossibilities.size() == replacementDistribution.probabilities().size());
auto replacement = currentReplacementPossibilities[replacementDistribution(rd)];
list.insert(i, replacement.begin(), replacement.end());
i = list.erase(i);
} else {
terminalsInList++;
++i;
}
if(i == list.end()) {
i = list.begin();
}
}
return list;
}
int main() {
for (int i = 0; i < 100; ++i) {
for (auto v : generate()) {
assert(terminalSimbols.find(v) != terminalSimbols.end());
std::cout << finalReplacement.at(v);
}
std::cout << std::endl;
}
return 0;
}
Output:
/home/emmanuel/Documentos/temp/testregularthing/cmake-build-debug/testregularthing
a := 1;b := 2;
;
a := 1;
bbab := 2;b := 1;
a := 2;
;
bab := 12;
b := 1;
b := 2;
a := 12;
;
;
;b := 1;
b := 1;
a := 1;
baa := 1;
a := 2;
bbaab := 1;
;
b := 2;
aba := 2;
;
a := 2;
a := 1;
bb := 2;
a := 2;
b := 111;
a := 1;
baab := 21;a := 2;bb := 121;bbbabaabab := 1;;b := 2;;
;
;
bb := 1;aa := 2;;
;
bbaa := 2;a := 1;
ba := 1;
baababaaababaabaaabaaba := 2;
aabbaabaabaaaaaabb := 12;
a := 1;
;;
;
ab := 2;
b := 2;
;
;;
a := 1;;;
bbb := 211;;;
;
aa := 1;
a := 1;
ba := 1;
;
abb := 1;
b := 1;
b := 2;b := 12;;b := 2;;a := 2;b := 2;;b := 111112;
b := 12;
;bba := 1;ab := 2;;
a := 1;
b := 1;
ba := 2;
a := 2;
ababb := 12;
aba := 1;
;
abaa := 1;
;a := 1;;
a := 2;
;
a := 1;
;
abaabb := 1;b := 1;a := 1;
b := 2;
;
;
a := 1;
;;
b := 1;
;
;
b := 12;
;
a := 2;
;a := 2;
;
;
;
;
b := 2;
bb := 1;b := 11122;
;
a := 1;
aabbb := 122;
b := 2121;
ba := 1;;
bb := 2;
ba := 2;
;
b := 1;
;
;
a := 11;
That's pretty cool. I had tried something similar a while back, but I never actually filled out the tables according to Jasper's syntax. Is that something you could do?
We don't have a full BNF for Jasper's syntax, but I could try to write it up, if that'd help.
I can try to make production rules that fit the language.
How should redefinitions of the same variable be handled? like: a:= 10; a:=1;
I can try to make production rules that fit the language.
Would having a BNF help?
How should redefinitions of the same variable be handled? like:
a:= 10; a:=1;
It shouldn't be allowed.
That is not what actually happens in the compiler, which could be considered a bug. (EDIT: addressed in #188)
Would having a BNF help?
Reading about BNF, it seems it constructs a context free language, the issue right now for me would be dealing with (variable) names and how to not repeat them.
the issue right now for me would be dealing with (variable) names and how to not repeat them.
The way we handle variables in the compiler is with a stack of hash tables.
Whenever a new scope is introduced (either by a for-statement, a block-statement, or a function-expression), we push a new table onto the stack, and pop it when the scope ends.
When producing a new name, one could generate random (valid) strings until one that is not "in the current scope" is generated.