gopher-lua
gopher-lua copied to clipboard
Implements support for string.dump and loadstring of binary blob
Fixes #127
Changes proposed in this pull request:
- Support for "string.dump"
- Support for doing "loadstring" on a compiled "blob" string
Noteworthy things:
- Incompatible with LUA5.1 binary format
- Need to decide on a GopherLua binary header signature
- Need to decide on a GopherLua binary header version
- Need to decide on how to handle LittleEndian/BigEndian
- Allow for the removal of debug data.
Coverage decreased (-0.6%) to 90.043% when pulling 4bd64b49d207d2f1c9238165bee449c2714a0118 on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.
Thanks for your great contribution!
You must not change state.go
directly. As I wrote in the contribution guideline, state.go
is an auto generated file. You should change _state.go
and then run a make build
command.
Well, sine I do not stick to the original binary format, I would like to suggest a simpler approach using the builtin gob package.
I've implemented a PoC in the feature-dump-func branch. I would like to get your thoughts on it.
Sorry for missing the thing about _state.go
, it is easily fixed.
I also took the road using encoding/gob
from the start. Then I realized that the output from encoding/gob
is far from optimal from a size perspective. So I decided to go with a known file format. I say known here and not well known, because the Lua binary format is to my knowledge not documented anywhere.
I did a small test comparing the output from a compiled chunk of print("hello world")
comparing the size difference between the two solutions.
GOB gives a file size of 758 bytes. GOL gives a file size of 152 bytes. LUA script file size is 21 bytes.
The difference will not be as large when comparing bigger files. But GOB does carry with it a very large overhead as it declares every singe part of the structure as a string. Resulting in a file filled with StringConstants, []uint32
etc. It is a tremendously flexible way of dumping and loading data to and from structs. But in my opinion it does a little bit to much magic
.
Coverage decreased (-0.6%) to 90.043% when pulling 87808e91402c301b2a31ee95c782f2374c59eca3 on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.
Coverage decreased (-0.6%) to 90.043% when pulling e870414c357fe86c6d6f101936059f0a007614bc on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.
I realize the background of this PR, thanks.
gob
is just a PoC.
I do not stick to the binary format and 'extreme' performance about dumps, so I think using general serializer may have a good balance of the maintenancebility(easily adapt to change of FunctionProto
etc) and the performance(size and speed).
I've updated the feature-dump-func branch to use the MessagePack as a trial.
MessagePack gives a file size of 260bytes( print("hello world")
).
Your proposal is very attractive, but I will have to think about it...
The problem I see with a general serializer is just that it is, general. It will always add additional overhead to allow for any type of structure. I am not arguing against you here, there are several good reasons for choosing a common format for serialization.
But do you really need that for FunctionProto
? Will it really ever change for as long as GopherLua is designed with Lua 5.1 as the base?
I see two possible solutions here:
- Choosing a single encoder solution for dumps in GopherLua and sticking with it. May it be DIY or an existing well defined format.
- GopherLua is primarily a library, so why not allow the implementer to choose their own encoder...?
What are your thoughts on that?
It is a big reason that I am not motivated to string.dump
. This kind of issues have been repeated before( #111 #65 #57) . I always said "I do not stop you to implement it yourself.", and your are the first person truly do it.
My thoughts
- I can not guarantee that opcodes and
FunctionProto
will never be changed(of cource dumped binary contains opcodes).- I consider Golang as the language that is not suitable for implementing fast scripting language(see #52). Many interpreter optimization methods can not be applied to Golang based interpreters.
- So, GopherLua needs GopherLua-specific opecodes and GopherLua-specific
FunctionProto
for Golang-specific optimizations. - For optimizations, I want to change it without any influence to library users.
- Library users may store Dumped binaries to storages like Databases and files. In this situation, If I change opecodes and(or)
FunctionProto
, I will affect users. - Thus, If you want to save lua functions, I will recommend you to save lua functions as a script.
- Lua bytecode has security issues. Nor original Lua or GopherLua do not have a bytecode velifier like the JAVA. This means your host program may be crashed when loading maliciously modified bytecode.
For above reasons, I thought that dump and load lua function in GopherLua should be done by user's own risk.
However, as the same topic is repeated often and your contribution is now here, I have come to think that implementing string.dump
with some restrictions is now okay .
But honestly I'm not so motivated about it and want to do it in a simple way... so I am uncertain about which choice might be best.
How about doing option number 2 then. As this will keep all doors open.
I can throw together a new solution which leaves the encoding and decoding out of the main GopherLua library. And only providing a way for developers to add their own solution. While maybe keeping a very basic solution in a sublibrary github.com/yuin/gopher-lua/dump
. Possibly using gob to lessen the dependencies.
Something like:
L.RegisterDumper(&CodeWrangler{
Encoder: myEncoderFunc,
Decoder: myDecoderFunc
})