compiler
compiler copied to clipboard
refactoring of the codebase
This issue is for listing down possible improvements for the compiler's codebase.
Current list of suggestions:
grow literal queue exponentially (currently linear)
In chk_grow_litq, the compiler does the following to resize the literal queue:
litmax +=sDEF_LITMAX;
This causes O(n^2) cells to be copied but this method is space efficient: in the worst case, max wastage will be sDEF_LITMAX - 1 cells.
The alternative solution is to:
litmax *= 2;
This causes O(n) cells to be copied but is less space efficient: in the worst case, half of the literal queue will be wasted.
Tests:
static arr[10000000] = {1, 2, 3, ...};
main () {
arr[0] = 123;
}
litmax+=sDEF_LITMAX; takes on average around 13s (compiled continuously)
litmax*=2; takes on average around 1.5s (compiled continuously)
The space efficiency is probably not a problem as gamemodes doesn't exceed few MBs (at most maybe 100MB). The amount of memory wasted in the worst case would be 100MB in that scenario. I am not sure if someone would ever have a situation like this though.
#define flgDEPRECATED 0x01 /* symbol is deprecated (avoid use) */
#define flagNAKED 0x10 /* function is naked */
#define flagPREDEF 0x20 /* symbol is pre-defined; successor of uPREDEF */
The flags member of s_symbol use just the above 3 bits. Is there a reason why they aren't continuous? (why isn't 0x02, 0x04, etc. there?)
unsigned char hasdefault; /* bit0: is there a default value? bit6: "tagof"; bit7: "sizeof" */
What happened to bits 1-5? Nowhere are those bits used as far as I could check.
What happened to bits 1-5? Nowhere are those bits used as far as I could check.
Maybe they were never even been used because they were supposed to be reserved backwards (7, 6, 5, ...) for some reason. At least, in Small 1.8.4 only bits 0 and 7 are used, and bit 6 isn't
char hasdefault; /* bit0: is there a default value? bit7: "sizeof" */
Also, since low memory use doesn't seem to be the first priority for this project, I think we could probably get rid from the numerous #if defined SC_LIGHT branches in order to clean up the codebase, just like with the packed string data in sc5.scp and sc7.scp.
inconsistent naming of preprocessor tokens
Just noticed that the constant for the #include directive is defined as tINCLUDE, with a t prefix, while the names of the other directive-related constants are prefixed with tp
#define tpERROR 323
#define tpFILE 324
#define tpIF 325 /* #if */
#define tINCLUDE 326
#define tpLINE 327
#define tpPRAGMA 328
so we should probably rename tINCLUDE to tpINCLUDE.
Symbol flags are named inconsistently too: while flgDEPRECATED starts with flg, flagNAKED and flagPREDEF are prefixed with flag.
use enum instead of #define for assign token numbers
#define taMULT 256 /* *= */
#define taDIV 257 /* /= */
#define taMOD 258 /* %= */
#define taADD 259 /* += */
#define taSUB 260 /* -= */
.
.
.
enum {
taMULT = 256, /* *= */
taDIV, /* /= */
taMOD, /* %= */
taADD, /* += */
taSUB /* -= */
};
The constants should not be hard-coded. Not hard-coding makes adding new entry easier. As of now, to add a new entry, all the defines which follow from the place of insertion has to be manually updated. It also feels insecure because people wouldn't know if there is code which relies on the numbers being the way they are.
scope mechanism
We could use a new concept of scope number instead of the file number. This would be useful for implementing the #section directive (and probably new features). It also makes the code legible (the current system is very confusing and highly error-prone) and easier to maintain.
unused error IDs
Error IDs 11 ("invalid outside functions") and 210 ("possible use of symbol before initialization") don't seem to be used anywhere within the compiler. I even searched for them in Pawn 3.0, Small 2.7.3 and Small 1.8.4 - they aren't used in there either (the IDs and descriptions for those errors haven't been changed through all of the listed versions). We could either restore their functionality or reuse their IDs for new errors.
There are many more unused items in the code.
iREFFUNC(appears to be a synonym foriFUNCTNbut has a different value)tDBLCOLON
pushstk crash (copying to null pointer)
SC_FUNC void pushstk(stkitem val)
{
assert(stkidx<=stktop);
if (stkidx==stktop) {
stkitem *newstack;
int newsize= (stktop==0) ? 16 : 2*stktop;
/* try to resize the stack */
assert(newsize>stktop);
newstack=(stkitem*)malloc(newsize*sizeof(stkitem));
if (newstack==NULL)
error(102,"parser stack"); /* stack overflow (recursive include?) */
/* swap the stacks */
memcpy(newstack,stack,stkidx*sizeof(stkitem));
if (stack!=NULL)
free(stack);
stack=newstack;
stktop=newsize;
} /* if */
assert(stkidx<stktop);
stack[stkidx]=val;
stkidx+=1;
}
It appears like the compiler is checking for an allocation failure. In such a case, it throws a compiler error and then continues copying to NULL which would crash the compiler.
I think realloc was meant to be used. The original author has used it at several places but not here. I am not able to see why realloc might cause an issue here.
plungequalifiedfile problems
path&real_pathmemory leak in early return- copying to null pointer
path/real_pathwould crash the compiler
SC_FUNC int plungequalifiedfile(char *name)
{
static char extensions[][6] = { "", ".inc", ".p", ".pawn" };
int found;
struct stat st;
FILE *fp;
char *path;
char *real_path;
char *ext;
int ext_idx;
fp=NULL;
ext_idx=0;
path=(char *)malloc(strlen(name)+sizeof(extensions[0]));
if (path==NULL) {
error(103); /* insufficient memory */
return;
} /* if */
strcpy(path,name);
real_path=(char *)malloc(strlen(name)+sizeof(extensions[0]));
if (real_path==NULL) {
error(103); /* insufficient memory */
free(path);
return;
}
do {
found=TRUE;
ext=strchr(path,'\0'); /* save position */
strcpy(ext,extensions[ext_idx]);
strcpy(real_path,path);
#if DIRSEP_CHAR!='\\'
if (pc_compat) {
char *ptr;
/* convert backslashes to native directory separators for maximum
* compatibility with the Windows compiler
*/
for (ptr=real_path; *ptr!='\0'; ptr++)
if (*ptr=='\\')
*ptr=DIRSEP_CHAR;
}
#endif
stat(real_path, &st);
if (!S_ISDIR(st.st_mode)) /* ignore directories with the same name */
fp=(FILE*)pc_opensrc(real_path);
if (fp==NULL) {
*ext='\0'; /* on failure, restore filename */
found=FALSE;
}
ext_idx++;
} while (!found && ext_idx<(sizeof extensions / sizeof extensions[0]));
if (!found) {
*ext='\0'; /* restore filename */
free(path);
free(real_path);
return FALSE;
} /* if */
PUSHSTK_P(inpf);
PUSHSTK_P(inpfname); /* pointer to current file name */
PUSHSTK_P(curlibrary);
PUSHSTK_I(iflevel);
assert(!SKIPPING);
assert(skiplevel==iflevel); /* these two are always the same when "parsing" */
PUSHSTK_I(sc_is_utf8);
PUSHSTK_I(icomment);
PUSHSTK_I(fcurrent);
PUSHSTK_I(fline);
inpfname=path; /* set name of include file */
if (inpfname==NULL)
error(103); /* insufficient memory */
inpf=fp; /* set input file pointer to include file */
fnumber++;
fline=0; /* set current line number to 0 */
fcurrent=fnumber;
icomment=0; /* not in a comment */
insert_dbgfile(inpfname);
setfiledirect(inpfname);
setfileconst(inpfname);
listline=-1; /* force a #line directive when changing the file */
sc_is_utf8=(short)scan_utf8(inpf,real_path);
free(real_path);
return TRUE;
}
iREFFUNC is unused
The compiler only checks against ident being iREFFUNC but never assigns it.
Just stumbled upon this tiny piece of code in function funcstub() (sc1.c):
sym->usage=(char)(uNATIVE | uRETVALUE | uDEFINE | (sym->usage & uPROTOTYPED));
Note that the assigned value is casted to char, but sym->usage is of type short. Good thing there's no uFORWARD among the flags on that line, otherwise it would be truncated.
This incorrect type cast might be a leftover from Small, where sym->usage indeed was of type char.
It appears like the compiler is checking for an allocation failure. In such a case, it throws a compiler error and then continues copying to NULL which would crash the compiler.
@YashasSamaga Did you really test this case or is this only your assumption that it would crash? Because normally error() is supposed to return directly to pc_compile() (by using longjmp()) on fatal errors (and error 102 is indeed a fatal one), avoiding any invalid operations with NULL pointers.
This piece of code from pushstk() looks interesting though:
memcpy(newstack,stack,stkidx*sizeof(stkitem));
if (stack!=NULL)
free(stack);
It seems that the stack pointer gets dereferenced first, then checked if it's NULL. Also, free() correctly handles NULL pointers (it does nothing if the pointer is NULL), so there's absolutely no reason to check if the pointer is NULL before using free().
uPREDEF is unused
Much like iREFFUNC, the compiler checks symbols for this flag to be set in the usage field, but never actually sets it.
While I was analyzing all uses of lexpush() (in case there are more misuses of this function in combination with matchtoken() or needtoken(), as described in https://github.com/pawn-lang/compiler/issues/524#issuecomment-625232959), I found this suspicious code in sc3.c:
https://github.com/pawn-lang/compiler/blob/ec5b7460dc67a9fbbf1b80a042501c608479d268/source/compiler/sc3.c#L2416-L2418
Normally needtoken() returns either the token index, or FALSE if the token didn't match; it's not supposed to return 1.
I think the author of this code intended to use matchtoken() instead; this function returns either 0 or 1 if the token matches or not, but if the requested token is tTERM and the compiler is in the "optional semicolons" mode, then matchtoken() can also return 2 (which means that instead of ; the lexer found a newline or EOF) and push back the next token so it can be analyzed later:
https://github.com/pawn-lang/compiler/blob/ec5b7460dc67a9fbbf1b80a042501c608479d268/source/compiler/sc2.c#L2435-L2440
This would explain the comparison of the returned value with 1 in the mentioned code (if (needtoken(tTERM)==1)), because if the result of matchtoken(tTERM) is 2, then we don't need to call lexpush() as the token is already pushed back.
Apparently, sOPTIMIZE_DEFAULT, despite its name, isn't the default optimization mode - sOPTIMIZE_NOMACRO is, as this value is assigned to pc_optimize by default in initglobals().
https://github.com/pawn-lang/compiler/blob/289cfeb1268f833ae1796debbab1e464306296ca/source/compiler/sc1.c#L924
I think we should rename sOPTIMIZE_DEFAULT into something less misleading (sOPTIMIZE_FULL, probably?)