packcc icon indicating copy to clipboard operation
packcc copied to clipboard

Unicode support in character classes

Open dolik-rce opened this issue 4 years ago • 4 comments

Hello,

I'd like to ask if this parser generator supports unicode in character classes? From my testing so far it seems that unicode escapes (e.g. \u1234) work fine in strings (at least for UTF-8 encoding), but if I use it in character class, it generates wrong code.

For example, if my grammar contains rule TEST <- [\u1234], this is generated:

static pcc_thunk_chunk_t *pcc_evaluate_rule_TEST(pkotlin_context_t *ctx) {
    pcc_thunk_chunk_t *chunk = pcc_thunk_chunk__create(ctx->auxil);
    chunk->pos = ctx->pos;
    {
        char c;
        if (pcc_refill_buffer(ctx, 1) < 1) goto L0000;
        c = ctx->buffer.buf[ctx->pos];
        if (!(
            c == '\xe1' ||
            c == '\x88' ||
            c == '\xb4'
        )) goto L0000;
        ctx->pos++;
    }
    return chunk;
L0000:;
    pcc_thunk_chunk__destroy(ctx->auxil, chunk);
    return NULL;
}

It looks like the character class specification is passed to unescape_string function and the matching is done on the bytes it returns (because E1 88 B4 is UTF-8 hex representation of \u1234).

dolik-rce avatar Jan 02 '21 12:01 dolik-rce

Yes, this definitely does not work correctly. Thank you for reporting! Unfortunately, I don't have access to necessary software to check and test your pull request right now, but I'll try to commit it as soon as possible

lil-lila avatar Jan 03 '21 14:01 lil-lila

Sure, take your time, there is no rush. By the way, would you be interested in adding tests to the repository? It would make future changes and/or fixes easier (both to develop and to verify).

dolik-rce avatar Jan 03 '21 15:01 dolik-rce

This would be very useful, I think. Unfortunately, tests writing is yet completely unknown field to me (I guess there are some guidelines and writing practices), but I'll do my best to help you, if you would like to add them

lil-lila avatar Jan 04 '21 16:01 lil-lila

I thought about writing a test case for packcc at https://github.com/universal-ctags/packcc. I think there are two ways to test packcc.

  1. whether packcc generates expected .c file or not.
  2. whether an executable built from .c file generated from packcc runs expectedly.

However, before thinking about testing I would like to hear your comment about the issue I opened: @enechaev, @dolik-rce, how do you think about https://github.com/arithy/packcc/issues/13 ?

masatake avatar Jan 04 '21 22:01 masatake