craftinginterpreters icon indicating copy to clipboard operation
craftinginterpreters copied to clipboard

Intermediate bytecode files

Open 10d9e opened this issue 4 years ago • 4 comments

Hello,

First of all, outstanding book and sample code. You really bring together a challenging concept and deliver it in an easily consumable format.

I was just wondering if anyone in the "lox community" (for lack of a better term) has implemented bytecode serialization to file. In other words instead of parsing, compiling and executing the code, I would like to be able to compile lox files into intermediate bytecode files and then load/execute them at a later time, in the same way the JVM does.

Thanks again for a really cool project.

10d9e avatar Oct 28 '20 22:10 10d9e

Pretty easy to implement.

C provides a way to write bytes to file using the fstream/ofstream functions. Simply decouple the VM from the chain (this will require you to kind of pass pointers around a bit) compile to a dynamic array. Then in the VM init method, you can accept a file name and read the file using the ifstream function into a FILE* that already contains your binary data. You can simply improve the interpret function to read from this pointer instead of the result of compile...

That's the rough idea though.

Will take time and see if I can create a fork of the project that implements this.

mcfriend99 avatar Nov 03 '20 10:11 mcfriend99

@mcfriend99 Yeah, it seems like the key might be to try to serialize/deserialize the ObjFunction struct in the compiler.

https://github.com/munificent/craftinginterpreters/blob/master/c/compiler.c#L1542

10d9e avatar Nov 04 '20 02:11 10d9e

Disclaimer: this is non-functional code

My first pass at it is still failing, but here's the code:

I am calling the loadChunk function below like so, ( after saving with saveChunk ):

InterpretResult interpret(const char* source) {
    ...
    // uncomment this code to test saving to file
    // ObjFunction *function = compile(vm, module, source);
    // saveChunk("test.bin", &function->chunk);

    // Create a new Function
    ObjFunction *function = newFunction(vm, module, TYPE_TOP_LEVEL);
    loadChunk("test.bin", &function->chunk, vm);
    ...

saveChunk and loadChunk (chunk.c)

void saveChunk(const char *name, Chunk *chunk) {
    FILE *fp = fopen(name, "wb");

    // write the constants
    fwrite(&chunk->constants.count, sizeof(chunk->constants.count), 1, fp);
    for(int i = 0; i < chunk->constants.count; i++) {
        uint64_t val = chunk->constants.values[i];
        fwrite(&val, sizeof(uint64_t), 1, fp);
    }

    printf("Constants: \n");
    for(int i = 0; i < chunk->constants.count; i++) {
        printf("{ %lld: ", chunk->constants.values[i]);
        printValue(chunk->constants.values[i]);
        printf(" }, ");
    }
    printf("\n");

    // write the code
    fwrite(&chunk->count, sizeof(chunk->count), 1, fp);
    for(int i = 0; i < chunk->count; i++) {
        fwrite(&chunk->code[i], sizeof(uint8_t), 1, fp);
        fwrite(&chunk->lines[i], sizeof(int), 1, fp);
    }

    fclose(fp); 
}

void loadChunk(const char *name, Chunk *chunk, VM *vm) {
    // Chunk chunk;
    chunk->count = 0;
    chunk->capacity = 0;
    chunk->code = NULL;
    chunk->lines = NULL;
    initValueArray(&chunk->constants);

    FILE *fp = fopen(name, "rb");

    // read the constants
    int constCount = 0;
    fread(&constCount, sizeof(chunk->constants.count), 1, fp);
    for(int i = 0; i < constCount; i++) {
        uint64_t val;
        fread(&val, sizeof(uint64_t), 1, fp);
        //writeValueArray(vm, &chunk->constants, val);
        addConstant(vm, chunk, val );
    }

    printf("Constants: \n");
    for(int i = 0; i < chunk->constants.count; i++) {
        printf("{ %lld: ", chunk->constants.values[i]);
        printValue(chunk->constants.values[i]);
        printf(" }, ");
    }

    // read the code
    int codeCount = 0;
    fread(&codeCount, sizeof(chunk->count), 1, fp);
    for(int i = 0; i < codeCount; i++) {
        uint8_t codeValue;
        fread(&codeValue, sizeof(uint8_t), 1, fp);
        int lineValue;
        fread(&lineValue, sizeof(int), 1, fp);
        writeChunk(vm, chunk, codeValue, lineValue);
    }

    fclose(fp);
    // debug
    disassembleChunk(chunk, "test chunk");
}

10d9e avatar Nov 11 '20 18:11 10d9e

This saveChunk method kinds of mangle everything together and will make deserialising more difficult.

You may want to consider adding markers and specialised file headers as well as registering the constant count in a segment of the file. This will make sure you can load the exact number of constants.

Also, you may need to create a segment for the code itself. Take an example at the low level IR generated by C using the -emit-llvm for Clang for an example to understand how your entire bytecode and constants can be represented at a low level.

You may also want to take the JPEG format for example to see how it organises metadata, dimensions and pixels data...

Hope you get the basic idea! :)

mcfriend99 avatar Mar 16 '21 18:03 mcfriend99