gopher-lua icon indicating copy to clipboard operation
gopher-lua copied to clipboard

Implements support for string.dump and loadstring of binary blob

Open ganehag opened this issue 7 years ago • 10 comments

Fixes #127

Changes proposed in this pull request:

  • Support for "string.dump"
  • Support for doing "loadstring" on a compiled "blob" string

Noteworthy things:

  • Incompatible with LUA5.1 binary format
  • Need to decide on a GopherLua binary header signature
  • Need to decide on a GopherLua binary header version
  • Need to decide on how to handle LittleEndian/BigEndian
  • Allow for the removal of debug data.

ganehag avatar Jun 20 '17 07:06 ganehag

Coverage Status

Coverage decreased (-0.6%) to 90.043% when pulling 4bd64b49d207d2f1c9238165bee449c2714a0118 on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.

coveralls avatar Jun 20 '17 07:06 coveralls

Thanks for your great contribution!

You must not change state.go directly. As I wrote in the contribution guideline, state.go is an auto generated file. You should change _state.go and then run a make build command.

Well, sine I do not stick to the original binary format, I would like to suggest a simpler approach using the builtin gob package.

I've implemented a PoC in the feature-dump-func branch. I would like to get your thoughts on it.

yuin avatar Jun 21 '17 04:06 yuin

Sorry for missing the thing about _state.go, it is easily fixed.

I also took the road using encoding/gob from the start. Then I realized that the output from encoding/gob is far from optimal from a size perspective. So I decided to go with a known file format. I say known here and not well known, because the Lua binary format is to my knowledge not documented anywhere.

I did a small test comparing the output from a compiled chunk of print("hello world") comparing the size difference between the two solutions.

GOB gives a file size of 758 bytes. GOL gives a file size of 152 bytes. LUA script file size is 21 bytes.

The difference will not be as large when comparing bigger files. But GOB does carry with it a very large overhead as it declares every singe part of the structure as a string. Resulting in a file filled with StringConstants, []uint32 etc. It is a tremendously flexible way of dumping and loading data to and from structs. But in my opinion it does a little bit to much magic.

ganehag avatar Jun 21 '17 07:06 ganehag

Coverage Status

Coverage decreased (-0.6%) to 90.043% when pulling 87808e91402c301b2a31ee95c782f2374c59eca3 on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.

coveralls avatar Jun 21 '17 07:06 coveralls

Coverage Status

Coverage decreased (-0.6%) to 90.043% when pulling e870414c357fe86c6d6f101936059f0a007614bc on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.

coveralls avatar Jun 21 '17 07:06 coveralls

I realize the background of this PR, thanks.

gob is just a PoC. I do not stick to the binary format and 'extreme' performance about dumps, so I think using general serializer may have a good balance of the maintenancebility(easily adapt to change of FunctionProto etc) and the performance(size and speed).

I've updated the feature-dump-func branch to use the MessagePack as a trial.

MessagePack gives a file size of 260bytes( print("hello world") ).

Your proposal is very attractive, but I will have to think about it...

yuin avatar Jun 21 '17 10:06 yuin

The problem I see with a general serializer is just that it is, general. It will always add additional overhead to allow for any type of structure. I am not arguing against you here, there are several good reasons for choosing a common format for serialization. But do you really need that for FunctionProto? Will it really ever change for as long as GopherLua is designed with Lua 5.1 as the base?

I see two possible solutions here:

  • Choosing a single encoder solution for dumps in GopherLua and sticking with it. May it be DIY or an existing well defined format.
  • GopherLua is primarily a library, so why not allow the implementer to choose their own encoder...?

What are your thoughts on that?

ganehag avatar Jun 22 '17 07:06 ganehag

It is a big reason that I am not motivated to string.dump . This kind of issues have been repeated before( #111 #65 #57) . I always said "I do not stop you to implement it yourself.", and your are the first person truly do it.

My thoughts

  • I can not guarantee that opcodes and FunctionProto will never be changed(of cource dumped binary contains opcodes).
    • I consider Golang as the language that is not suitable for implementing fast scripting language(see #52). Many interpreter optimization methods can not be applied to Golang based interpreters.
    • So, GopherLua needs GopherLua-specific opecodes and GopherLua-specific FunctionProto for Golang-specific optimizations.
    • For optimizations, I want to change it without any influence to library users.
    • Library users may store Dumped binaries to storages like Databases and files. In this situation, If I change opecodes and(or) FunctionProto, I will affect users.
    • Thus, If you want to save lua functions, I will recommend you to save lua functions as a script.
  • Lua bytecode has security issues. Nor original Lua or GopherLua do not have a bytecode velifier like the JAVA. This means your host program may be crashed when loading maliciously modified bytecode.

For above reasons, I thought that dump and load lua function in GopherLua should be done by user's own risk.

However, as the same topic is repeated often and your contribution is now here, I have come to think that implementing string.dump with some restrictions is now okay .

But honestly I'm not so motivated about it and want to do it in a simple way... so I am uncertain about which choice might be best.

yuin avatar Jun 22 '17 12:06 yuin

How about doing option number 2 then. As this will keep all doors open.

I can throw together a new solution which leaves the encoding and decoding out of the main GopherLua library. And only providing a way for developers to add their own solution. While maybe keeping a very basic solution in a sublibrary github.com/yuin/gopher-lua/dump. Possibly using gob to lessen the dependencies.

Something like:

L.RegisterDumper(&CodeWrangler{
  Encoder: myEncoderFunc,
  Decoder: myDecoderFunc
})

ganehag avatar Jun 23 '17 05:06 ganehag

Coverage Status

Coverage decreased (-3.6%) to 87.005% when pulling 754bc3935067b074af16fcd4e0ccbce938ace970 on ganehag:master into b402f3114ec730d8bddb074a6c137309f561aa78 on yuin:master.

coveralls avatar Jun 25 '17 08:06 coveralls