craftinginterpreters
craftinginterpreters copied to clipboard
Intermediate bytecode files
Hello,
First of all, outstanding book and sample code. You really bring together a challenging concept and deliver it in an easily consumable format.
I was just wondering if anyone in the "lox community" (for lack of a better term) has implemented bytecode serialization to file. In other words instead of parsing, compiling and executing the code, I would like to be able to compile lox files into intermediate bytecode files and then load/execute them at a later time, in the same way the JVM does.
Thanks again for a really cool project.
Pretty easy to implement.
C provides a way to write bytes to file using the fstream/ofstream
functions. Simply decouple the VM from the chain (this will require you to kind of pass pointers around a bit) compile to a dynamic array. Then in the VM init method, you can accept a file name and read the file using the ifstream
function into a FILE* that already contains your binary data. You can simply improve the interpret
function to read from this pointer instead of the result of compile
...
That's the rough idea though.
Will take time and see if I can create a fork of the project that implements this.
@mcfriend99 Yeah, it seems like the key might be to try to serialize/deserialize the ObjFunction
struct in the compiler.
https://github.com/munificent/craftinginterpreters/blob/master/c/compiler.c#L1542
Disclaimer: this is non-functional code
My first pass at it is still failing, but here's the code:
I am calling the loadChunk function below like so, ( after saving with saveChunk
):
InterpretResult interpret(const char* source) {
...
// uncomment this code to test saving to file
// ObjFunction *function = compile(vm, module, source);
// saveChunk("test.bin", &function->chunk);
// Create a new Function
ObjFunction *function = newFunction(vm, module, TYPE_TOP_LEVEL);
loadChunk("test.bin", &function->chunk, vm);
...
saveChunk and loadChunk (chunk.c)
void saveChunk(const char *name, Chunk *chunk) {
FILE *fp = fopen(name, "wb");
// write the constants
fwrite(&chunk->constants.count, sizeof(chunk->constants.count), 1, fp);
for(int i = 0; i < chunk->constants.count; i++) {
uint64_t val = chunk->constants.values[i];
fwrite(&val, sizeof(uint64_t), 1, fp);
}
printf("Constants: \n");
for(int i = 0; i < chunk->constants.count; i++) {
printf("{ %lld: ", chunk->constants.values[i]);
printValue(chunk->constants.values[i]);
printf(" }, ");
}
printf("\n");
// write the code
fwrite(&chunk->count, sizeof(chunk->count), 1, fp);
for(int i = 0; i < chunk->count; i++) {
fwrite(&chunk->code[i], sizeof(uint8_t), 1, fp);
fwrite(&chunk->lines[i], sizeof(int), 1, fp);
}
fclose(fp);
}
void loadChunk(const char *name, Chunk *chunk, VM *vm) {
// Chunk chunk;
chunk->count = 0;
chunk->capacity = 0;
chunk->code = NULL;
chunk->lines = NULL;
initValueArray(&chunk->constants);
FILE *fp = fopen(name, "rb");
// read the constants
int constCount = 0;
fread(&constCount, sizeof(chunk->constants.count), 1, fp);
for(int i = 0; i < constCount; i++) {
uint64_t val;
fread(&val, sizeof(uint64_t), 1, fp);
//writeValueArray(vm, &chunk->constants, val);
addConstant(vm, chunk, val );
}
printf("Constants: \n");
for(int i = 0; i < chunk->constants.count; i++) {
printf("{ %lld: ", chunk->constants.values[i]);
printValue(chunk->constants.values[i]);
printf(" }, ");
}
// read the code
int codeCount = 0;
fread(&codeCount, sizeof(chunk->count), 1, fp);
for(int i = 0; i < codeCount; i++) {
uint8_t codeValue;
fread(&codeValue, sizeof(uint8_t), 1, fp);
int lineValue;
fread(&lineValue, sizeof(int), 1, fp);
writeChunk(vm, chunk, codeValue, lineValue);
}
fclose(fp);
// debug
disassembleChunk(chunk, "test chunk");
}
This saveChunk
method kinds of mangle everything together and will make deserialising more difficult.
You may want to consider adding markers and specialised file headers as well as registering the constant count in a segment of the file. This will make sure you can load the exact number of constants.
Also, you may need to create a segment for the code itself. Take an example at the low level IR generated by C using the -emit-llvm
for Clang for an example to understand how your entire bytecode and constants can be represented at a low level.
You may also want to take the JPEG format for example to see how it organises metadata, dimensions and pixels data...
Hope you get the basic idea! :)