compiler icon indicating copy to clipboard operation
compiler copied to clipboard

refactoring of the codebase

Open YashasSamaga opened this issue 7 years ago • 16 comments

This issue is for listing down possible improvements for the compiler's codebase.

Current list of suggestions:

YashasSamaga avatar Mar 25 '18 05:03 YashasSamaga

grow literal queue exponentially (currently linear)

In chk_grow_litq, the compiler does the following to resize the literal queue:

litmax +=sDEF_LITMAX;

This causes O(n^2) cells to be copied but this method is space efficient: in the worst case, max wastage will be sDEF_LITMAX - 1 cells.

The alternative solution is to:

litmax *= 2;

This causes O(n) cells to be copied but is less space efficient: in the worst case, half of the literal queue will be wasted.

Tests:

static arr[10000000] = {1, 2, 3, ...};
main () {
    arr[0] = 123;
}

litmax+=sDEF_LITMAX; takes on average around 13s (compiled continuously) litmax*=2; takes on average around 1.5s (compiled continuously)

The space efficiency is probably not a problem as gamemodes doesn't exceed few MBs (at most maybe 100MB). The amount of memory wasted in the worst case would be 100MB in that scenario. I am not sure if someone would ever have a situation like this though.

YashasSamaga avatar Mar 25 '18 05:03 YashasSamaga

#define flgDEPRECATED 0x01  /* symbol is deprecated (avoid use) */
#define flagNAKED     0x10  /* function is naked */
#define flagPREDEF    0x20  /* symbol is pre-defined; successor of uPREDEF */

The flags member of s_symbol use just the above 3 bits. Is there a reason why they aren't continuous? (why isn't 0x02, 0x04, etc. there?)


unsigned char hasdefault; /* bit0: is there a default value? bit6: "tagof"; bit7: "sizeof" */

What happened to bits 1-5? Nowhere are those bits used as far as I could check.

YashasSamaga avatar Mar 25 '18 06:03 YashasSamaga

What happened to bits 1-5? Nowhere are those bits used as far as I could check.

Maybe they were never even been used because they were supposed to be reserved backwards (7, 6, 5, ...) for some reason. At least, in Small 1.8.4 only bits 0 and 7 are used, and bit 6 isn't char hasdefault; /* bit0: is there a default value? bit7: "sizeof" */


Also, since low memory use doesn't seem to be the first priority for this project, I think we could probably get rid from the numerous #if defined SC_LIGHT branches in order to clean up the codebase, just like with the packed string data in sc5.scp and sc7.scp.

Daniel-Cortez avatar Mar 25 '18 14:03 Daniel-Cortez

inconsistent naming of preprocessor tokens

Just noticed that the constant for the #include directive is defined as tINCLUDE, with a t prefix, while the names of the other directive-related constants are prefixed with tp

#define tpERROR     323
#define tpFILE      324
#define tpIF        325 /* #if */
#define tINCLUDE    326
#define tpLINE      327
#define tpPRAGMA    328

so we should probably rename tINCLUDE to tpINCLUDE.

Symbol flags are named inconsistently too: while flgDEPRECATED starts with flg, flagNAKED and flagPREDEF are prefixed with flag.

Daniel-Cortez avatar Jul 22 '18 03:07 Daniel-Cortez

use enum instead of #define for assign token numbers

#define taMULT      256 /* *= */
#define taDIV       257 /* /= */
#define taMOD       258 /* %= */
#define taADD       259 /* += */
#define taSUB       260 /* -= */
.
.
.

enum {
     taMULT = 256, /* *= */
     taDIV, /* /= */
     taMOD, /* %= */
     taADD, /* += */
     taSUB  /* -= */
};

The constants should not be hard-coded. Not hard-coding makes adding new entry easier. As of now, to add a new entry, all the defines which follow from the place of insertion has to be manually updated. It also feels insecure because people wouldn't know if there is code which relies on the numbers being the way they are.

YashasSamaga avatar Aug 01 '18 10:08 YashasSamaga

scope mechanism

We could use a new concept of scope number instead of the file number. This would be useful for implementing the #section directive (and probably new features). It also makes the code legible (the current system is very confusing and highly error-prone) and easier to maintain.

YashasSamaga avatar Aug 01 '18 10:08 YashasSamaga

unused error IDs

Error IDs 11 ("invalid outside functions") and 210 ("possible use of symbol before initialization") don't seem to be used anywhere within the compiler. I even searched for them in Pawn 3.0, Small 2.7.3 and Small 1.8.4 - they aren't used in there either (the IDs and descriptions for those errors haven't been changed through all of the listed versions). We could either restore their functionality or reuse their IDs for new errors.

Daniel-Cortez avatar Aug 01 '18 13:08 Daniel-Cortez

There are many more unused items in the code.

  • iREFFUNC (appears to be a synonym for iFUNCTN but has a different value)
  • tDBLCOLON

YashasSamaga avatar Aug 01 '18 13:08 YashasSamaga

pushstk crash (copying to null pointer)

SC_FUNC void pushstk(stkitem val)
{
  assert(stkidx<=stktop);
  if (stkidx==stktop) {
    stkitem *newstack;
    int newsize= (stktop==0) ? 16 : 2*stktop;
    /* try to resize the stack */
    assert(newsize>stktop);
    newstack=(stkitem*)malloc(newsize*sizeof(stkitem));
    if (newstack==NULL)
      error(102,"parser stack");  /* stack overflow (recursive include?) */
    /* swap the stacks */
    memcpy(newstack,stack,stkidx*sizeof(stkitem));
    if (stack!=NULL)
      free(stack);
    stack=newstack;
    stktop=newsize;
  } /* if */
  assert(stkidx<stktop);
  stack[stkidx]=val;
  stkidx+=1;
}

It appears like the compiler is checking for an allocation failure. In such a case, it throws a compiler error and then continues copying to NULL which would crash the compiler.

I think realloc was meant to be used. The original author has used it at several places but not here. I am not able to see why realloc might cause an issue here.

YashasSamaga avatar Dec 26 '18 04:12 YashasSamaga

plungequalifiedfile problems

  1. path & real_path memory leak in early return
  2. copying to null pointer path/real_path would crash the compiler
SC_FUNC int plungequalifiedfile(char *name)
{
static char extensions[][6] = { "", ".inc", ".p", ".pawn" };
  int found;
  struct stat st;
  FILE *fp;
  char *path;
  char *real_path;
  char *ext;
  int ext_idx;

  fp=NULL;
  ext_idx=0;
  path=(char *)malloc(strlen(name)+sizeof(extensions[0]));
  if (path==NULL) {
    error(103);                 /* insufficient memory */
    return;
  } /* if */
  strcpy(path,name);
  real_path=(char *)malloc(strlen(name)+sizeof(extensions[0]));
  if (real_path==NULL) {
    error(103);                 /* insufficient memory */
    free(path);
    return;
  }
  do {
    found=TRUE;
    ext=strchr(path,'\0');      /* save position */
    strcpy(ext,extensions[ext_idx]);
    strcpy(real_path,path);
    #if DIRSEP_CHAR!='\\'
      if (pc_compat) {
        char *ptr;
        /* convert backslashes to native directory separators for maximum
         * compatibility with the Windows compiler
         */
        for (ptr=real_path; *ptr!='\0'; ptr++)
          if (*ptr=='\\')
            *ptr=DIRSEP_CHAR;
      }
    #endif
    stat(real_path, &st);
    if (!S_ISDIR(st.st_mode))   /* ignore directories with the same name */
      fp=(FILE*)pc_opensrc(real_path);
    if (fp==NULL) {
      *ext='\0';                /* on failure, restore filename */
      found=FALSE;
    }
    ext_idx++;
  } while (!found && ext_idx<(sizeof extensions / sizeof extensions[0]));
  if (!found) {
    *ext='\0';                  /* restore filename */
    free(path);
    free(real_path);
    return FALSE;
  } /* if */
  PUSHSTK_P(inpf);
  PUSHSTK_P(inpfname);          /* pointer to current file name */
  PUSHSTK_P(curlibrary);
  PUSHSTK_I(iflevel);
  assert(!SKIPPING);
  assert(skiplevel==iflevel);   /* these two are always the same when "parsing" */
  PUSHSTK_I(sc_is_utf8);
  PUSHSTK_I(icomment);
  PUSHSTK_I(fcurrent);
  PUSHSTK_I(fline);
  inpfname=path;            /* set name of include file */
  if (inpfname==NULL)
    error(103);             /* insufficient memory */
  inpf=fp;                  /* set input file pointer to include file */
  fnumber++;
  fline=0;                  /* set current line number to 0 */
  fcurrent=fnumber;
  icomment=0;               /* not in a comment */
  insert_dbgfile(inpfname);
  setfiledirect(inpfname);
  setfileconst(inpfname);
  listline=-1;              /* force a #line directive when changing the file */
  sc_is_utf8=(short)scan_utf8(inpf,real_path);
  free(real_path);
  return TRUE;
}

YashasSamaga avatar Dec 26 '18 04:12 YashasSamaga

iREFFUNC is unused

The compiler only checks against ident being iREFFUNC but never assigns it.

YashasSamaga avatar Dec 30 '18 17:12 YashasSamaga

Just stumbled upon this tiny piece of code in function funcstub() (sc1.c):

sym->usage=(char)(uNATIVE | uRETVALUE | uDEFINE | (sym->usage & uPROTOTYPED));

Note that the assigned value is casted to char, but sym->usage is of type short. Good thing there's no uFORWARD among the flags on that line, otherwise it would be truncated. This incorrect type cast might be a leftover from Small, where sym->usage indeed was of type char.

Daniel-Cortez avatar Dec 31 '18 12:12 Daniel-Cortez

It appears like the compiler is checking for an allocation failure. In such a case, it throws a compiler error and then continues copying to NULL which would crash the compiler.

@YashasSamaga Did you really test this case or is this only your assumption that it would crash? Because normally error() is supposed to return directly to pc_compile() (by using longjmp()) on fatal errors (and error 102 is indeed a fatal one), avoiding any invalid operations with NULL pointers.

This piece of code from pushstk() looks interesting though:

    memcpy(newstack,stack,stkidx*sizeof(stkitem));
    if (stack!=NULL)
      free(stack);

It seems that the stack pointer gets dereferenced first, then checked if it's NULL. Also, free() correctly handles NULL pointers (it does nothing if the pointer is NULL), so there's absolutely no reason to check if the pointer is NULL before using free().

Daniel-Cortez avatar Dec 31 '18 12:12 Daniel-Cortez

uPREDEF is unused

Much like iREFFUNC, the compiler checks symbols for this flag to be set in the usage field, but never actually sets it.

Daniel-Cortez avatar Aug 01 '19 18:08 Daniel-Cortez

While I was analyzing all uses of lexpush() (in case there are more misuses of this function in combination with matchtoken() or needtoken(), as described in https://github.com/pawn-lang/compiler/issues/524#issuecomment-625232959), I found this suspicious code in sc3.c:

https://github.com/pawn-lang/compiler/blob/ec5b7460dc67a9fbbf1b80a042501c608479d268/source/compiler/sc3.c#L2416-L2418

Normally needtoken() returns either the token index, or FALSE if the token didn't match; it's not supposed to return 1. I think the author of this code intended to use matchtoken() instead; this function returns either 0 or 1 if the token matches or not, but if the requested token is tTERM and the compiler is in the "optional semicolons" mode, then matchtoken() can also return 2 (which means that instead of ; the lexer found a newline or EOF) and push back the next token so it can be analyzed later:

https://github.com/pawn-lang/compiler/blob/ec5b7460dc67a9fbbf1b80a042501c608479d268/source/compiler/sc2.c#L2435-L2440

This would explain the comparison of the returned value with 1 in the mentioned code (if (needtoken(tTERM)==1)), because if the result of matchtoken(tTERM) is 2, then we don't need to call lexpush() as the token is already pushed back.

Daniel-Cortez avatar May 08 '20 16:05 Daniel-Cortez

Apparently, sOPTIMIZE_DEFAULT, despite its name, isn't the default optimization mode - sOPTIMIZE_NOMACRO is, as this value is assigned to pc_optimize by default in initglobals(). https://github.com/pawn-lang/compiler/blob/289cfeb1268f833ae1796debbab1e464306296ca/source/compiler/sc1.c#L924 I think we should rename sOPTIMIZE_DEFAULT into something less misleading (sOPTIMIZE_FULL, probably?)

Daniel-Cortez avatar Oct 03 '20 06:10 Daniel-Cortez