Jasper icon indicating copy to clipboard operation
Jasper copied to clipboard

Fuzzing implementation that doesn't require libraries

Open EmmanuelMess opened this issue 5 years ago • 6 comments

I propose a function that generates syntactically correct strings, to test the interpreter:

enum Simbol {
	SIGMA, DECLARATION, VAR_NAME, NUMBER,
	assign, end,
	letterA, letterB,
	number1, number2
};

const std::set<Simbol> terminalSimbols {
	assign, end, letterA, letterB, number1, number2
};

const std::unordered_map<Simbol, std::string> finalReplacement {
	{assign,  " := " },
	{end,     ";" },
	{letterA, "a"},
	{letterB, "b"},
	{number1, "1"},
	{number2, "2"},
};

const std::unordered_map<Simbol, std::vector<std::vector<Simbol>>> intermidiateReplacement {
	{SIGMA,          { { DECLARATION } } },
	{DECLARATION,   { { end }, { VAR_NAME, assign, NUMBER, end }, { DECLARATION, DECLARATION } } },
	{VAR_NAME,      { { letterA }, { letterB }, { VAR_NAME, VAR_NAME } } },
	{NUMBER,        { { number1} , { number2 }, { NUMBER, NUMBER } } }
};

const std::unordered_map<Simbol,  std::discrete_distribution<int>> intermidiateReplacementDist {
	{SIGMA,          { 1 } },
	{DECLARATION,   { 0.5, 1, 0.5} },
	{VAR_NAME,       { 1, 1, 1 } },
	{NUMBER,        { 1, 1, 0.5 } }
};

std::list<Simbol> generate() {
	std::random_device rd;
	std::mt19937 gen(rd());
	std::list<Simbol> list;
	list.push_back(SIGMA);

	int terminalsInList = 0;

	auto i = list.begin();
	while (terminalsInList < list.size()) {
		if(i == list.begin()) {
			terminalsInList = 0;
		}

		bool terminal = terminalSimbols.find(*i) != terminalSimbols.end();

		if (!terminal) {
			auto replacementDistribution = intermidiateReplacementDist.at(*i);
			std::vector<std::vector<Simbol>> currentReplacementPossibilities = intermidiateReplacement.at(*i);
			assert(currentReplacementPossibilities.size() == replacementDistribution.probabilities().size());

			auto replacement = currentReplacementPossibilities[replacementDistribution(rd)];

			list.insert(i, replacement.begin(), replacement.end());
			i = list.erase(i);
		} else {
			terminalsInList++;
			++i;
		}

		if(i == list.end()) {
			i = list.begin();
		}
	}

	return list;
}

int main() {

	for (int i = 0; i < 100; ++i) {
		for (auto v : generate()) {
			assert(terminalSimbols.find(v) != terminalSimbols.end());
			std::cout << finalReplacement.at(v);
		}
		std::cout << std::endl;
	}


	return 0;
}

Output:

/home/emmanuel/Documentos/temp/testregularthing/cmake-build-debug/testregularthing
a := 1;b := 2;
;
a := 1;
bbab := 2;b := 1;
a := 2;
;
bab := 12;
b := 1;
b := 2;
a := 12;
;
;
;b := 1;
b := 1;
a := 1;
baa := 1;
a := 2;
bbaab := 1;
;
b := 2;
aba := 2;
;
a := 2;
a := 1;
bb := 2;
a := 2;
b := 111;
a := 1;
baab := 21;a := 2;bb := 121;bbbabaabab := 1;;b := 2;;
;
;
bb := 1;aa := 2;;
;
bbaa := 2;a := 1;
ba := 1;
baababaaababaabaaabaaba := 2;
aabbaabaabaaaaaabb := 12;
a := 1;
;;
;
ab := 2;
b := 2;
;
;;
a := 1;;;
bbb := 211;;;
;
aa := 1;
a := 1;
ba := 1;
;
abb := 1;
b := 1;
b := 2;b := 12;;b := 2;;a := 2;b := 2;;b := 111112;
b := 12;
;bba := 1;ab := 2;;
a := 1;
b := 1;
ba := 2;
a := 2;
ababb := 12;
aba := 1;
;
abaa := 1;
;a := 1;;
a := 2;
;
a := 1;
;
abaabb := 1;b := 1;a := 1;
b := 2;
;
;
a := 1;
;;
b := 1;
;
;
b := 12;
;
a := 2;
;a := 2;
;
;
;
;
b := 2;
bb := 1;b := 11122;
;
a := 1;
aabbb := 122;
b := 2121;
ba := 1;;
bb := 2;
ba := 2;
;
b := 1;
;
;
a := 11;

EmmanuelMess avatar Nov 05 '20 00:11 EmmanuelMess

That's pretty cool. I had tried something similar a while back, but I never actually filled out the tables according to Jasper's syntax. Is that something you could do?

We don't have a full BNF for Jasper's syntax, but I could try to write it up, if that'd help.

SebastianMestre avatar Nov 05 '20 18:11 SebastianMestre

I can try to make production rules that fit the language.

EmmanuelMess avatar Nov 05 '20 19:11 EmmanuelMess

How should redefinitions of the same variable be handled? like: a:= 10; a:=1;

EmmanuelMess avatar Nov 06 '20 00:11 EmmanuelMess

I can try to make production rules that fit the language.

Would having a BNF help?

How should redefinitions of the same variable be handled? like: a:= 10; a:=1;

It shouldn't be allowed.

That is not what actually happens in the compiler, which could be considered a bug. (EDIT: addressed in #188)

SebastianMestre avatar Nov 06 '20 13:11 SebastianMestre

Would having a BNF help?

Reading about BNF, it seems it constructs a context free language, the issue right now for me would be dealing with (variable) names and how to not repeat them.

EmmanuelMess avatar Nov 06 '20 16:11 EmmanuelMess

the issue right now for me would be dealing with (variable) names and how to not repeat them.

The way we handle variables in the compiler is with a stack of hash tables.

Whenever a new scope is introduced (either by a for-statement, a block-statement, or a function-expression), we push a new table onto the stack, and pop it when the scope ends.

When producing a new name, one could generate random (valid) strings until one that is not "in the current scope" is generated.

SebastianMestre avatar Nov 06 '20 17:11 SebastianMestre