ImHex-Patterns icon indicating copy to clipboard operation
ImHex-Patterns copied to clipboard

Fixed a bug in the Lua 5.1 bytecode pattern

Open Guest257351 opened this issue 1 year ago • 14 comments

Lua 5.1 bytecode upvalue fields were incorrectly defined as a Vector<u32> instead of Vector<LuaString>.

Guest257351 avatar Aug 27 '24 05:08 Guest257351

I know very little about Lua byte code, but having separate patterns for each version of Lua makes dealing with problems like this a pain. Instead of checking every other Lua to see if the error exists there I unified all the Lua versions into one. The resulting pattern has this definition for the upValues

struct UpValues {
    if (minorVersion == 1)
        Vector<u32> upvalues;
    else
        Vector<LuaString> upvalues;
}[[inline]];

Not only shows a potential error but also suggests what the fix would be and all of this without having to visit each and every version there is. I am attaching the unified version (change the extension that github cant deal with to .hexpat) I wrote which I tested successfully with all the tests for lua versions ImHex has, but probably needs more rigorous tests given that this error was not found. lua.txt

paxcut avatar Aug 27 '24 09:08 paxcut

I know very little about Lua byte code, but having separate patterns for each version of Lua makes dealing with problems like this a pain. Instead of checking every other Lua to see if the error exists there I unified all the Lua versions into one. The resulting pattern has this definition for the upValues

struct UpValues {
    if (minorVersion == 1)
        Vector<u32> upvalues;
    else
        Vector<LuaString> upvalues;
}[[inline]];

Not only shows a potential error but also suggests what the fix would be and all of this without having to visit each and every version there is. I am attaching the unified version (change the extension that github cant deal with to .hexpat) I wrote which I tested successfully with all the tests for lua versions ImHex has, but probably needs more rigorous tests given that this error was not found. lua.txt

As far as I'm aware lua does not use u32s or integers of any kind for it's upvalue vectors. I don't know of any lua version that does. The upvalue vector is only there for debug information. As for having multiple version being a pain, I might look into how lua creates bytecode for versions 5.1 onward to see if they can easily be put into a single pattern.

Guest257351 avatar Aug 29 '24 00:08 Guest257351

Just added a commit for a separate similar bug. I forgot to include this in the initial pull request. It's also worth noting that the number of bytes in the string size is not static. By default lua uses 32-bit integers and just about every implementation of it also does since there's no good reason to change it. The number of bytes is controlled by size_of_size_t in LuaBinaryHeader so it might be worth changing the pattern to support this value being anything other than 4.

Guest257351 avatar Aug 29 '24 00:08 Guest257351

As for having multiple version being a pain, I might look into how lua creates bytecode for versions 5.1 onward to see if they can easily be put into a single pattern.

I put a link to the unified version of the lua pattern. Did you look at it? Ill post it here in case you cant get the link

#pragma description Lua 5.1, 5.2, 5.3 and 5.4 bytecode

import std.io;
import std.mem;
import type.base;

u8 minorVersion = std::mem::read_unsigned(4,1)&15;

namespace impl {
    fn transform_Size_array(auto array) {
        u128 res = 0;
        for(u8 i = 0, (array[i] & 0x80 == 0) && (i < 9), i+=1) {
            res <<= 7;
            res |= array[i] & 0x7f;
        }
        res <<= 7;
        res |= array[sizeof(array)-1] & 0x7f;
        return res;
    };
    
    fn format_Size(auto leb128) {
        if (minorVersion == 4) {
            u128 res = impl::transform_Size_array(leb128.array);
            return std::format("{} ({:#x})", res, res);
        } else
            return std::format("{}",leb128.size);
    };

    fn format_StringSize(auto leb128) {
        if (minorVersion == 4) {
            u128 res = impl::transform_Size_array(leb128.size.array);
            return std::format("{} ({:#x})", res, res);
        } else
            return std::format("{}",leb128.size);
    };
    fn transform_Size(auto leb128) {
        if (minorVersion == 4) 
            return impl::transform_Size_array(leb128.array);
         else
            return leb128.size;
    };

    fn transform_StringSize(auto leb128) {
         if (minorVersion == 4) 
            return impl::transform_Size_array(leb128.size.array);
         else
            return leb128.size;
    };

    fn format_LuaString(auto string) {
       if (string.size == 0) {
           return "None";
        }
        return std::format("\"{}\"", string.data);
    };
    
    fn format_Constant(auto constant) {
       return constant.data;
    };
    
    fn format_Version(auto ver) {
       return std::format("Ver. {}.{}", ver.major, ver.minor);
    };
}

using LuaFunction;

struct Size {
    if (minorVersion == 4)
        u8 array[while(addressof(this) == $ || ((addressof(this)-$ < 9) && (std::mem::read_unsigned($-1, 1) & 0x80 == 0)))] [[hidden]];
    else
        u32 size[[hidden]];
} [[sealed, format("impl::format_Size"), transform("impl::transform_Size")]];
    
    
struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else
        u64 size;
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];



bitfield Version {
    minor : 4;
    major : 4;
} [[format("impl::format_Version")]];

struct LuaBinaryHeader {
    char magic[4];
    Version version_number;
    u8 format_version;
    if (minorVersion == 3 || minorVersion == 4)
        char LUAC_DATA[6];
    else
        u8 endianness;

    u8 size_of_int;
    u8 size_of_size_t;
    match (minorVersion) {
        (1 | 2) : {
            u8 size_of_Instruction;
            u8 lua_Number;
            u8 is_lua_Number_integral;
            if (minorVersion==2)
                char LUAC_TAIL[6];
        }
        (3 | 4) : { 
            if (minorVersion == 3) {
                u8 size_of_Instruction;
                u8 size_of_lua_Integer;
            }
            u8 sizeof_lua_Number;
            type::Hex<u64> LUAC_INT;
            double LUAC_NUM;
        }
    }
};

struct LuaString {
    LuaStringSize size;
    bool correction = (minorVersion == 3 || minorVersion == 4);
    if (size > 0) 
        char data[size-correction];
}[[format("impl::format_LuaString")]];

struct SourceOnDebug {
    if (minorVersion == 2)
        LuaString source;
}[[inline,format("format_source_on")]];

struct SourceOnFunction {
    if (minorVersion != 2)
        LuaString source;
}[[inline,format("format_source_on")]];

fn format_source_on(auto v) {
    return std::format("{}",v.source);
};

enum LUA : u8 {
    TNIL = 0,
    TBOOLEAN,
    TNUMFLT = minorVersion==4 ? 3 : 0x13,
    TSHRSTR = 4,
    TNUMINT = minorVersion==4 ? 0X13 : 3,
    TLNGSTR = 0x14
};


struct LuaConstant {
    LUA type;
    match (type) {
        //(LUA::TNIL)             : NULL
        (LUA::TBOOLEAN) : u8 data;
        (LUA::TNUMFLT)  : double data;
        (LUA::TNUMINT)  : u64 data;
        (LUA::TSHRSTR | LUA::TLNGSTR)   : LuaString data;
    }
}[[format("impl::format_Constant")]];

struct LuaUpvalue {
    u8 instack;
    u8 idx;
    if (minorVersion == 4)
        u8 kind;
};

struct Vector<T> {
    Size size;
    if (size > 0)
        T values[size];
};

struct AbsLineInfo {
    Size pc;
    Size line;
};

struct LocalVar {
    LuaString varname;
    Size startpc;
    Size endpc;
};

struct LineInfo {
    if (minorVersion == 4) {
        Vector<s8> lineInfo;
        Vector<AbsLineInfo> abslineinfo;
    } else
        Vector<u32> lineInfo;
}[[inline]];

struct UpValues {
    if (minorVersion == 1)
        Vector<u32> upvalues;
    else
        Vector<LuaString> upvalues;
}[[inline]];

struct LuaDebugInfo {
    SourceOnDebug source;
    LineInfo lineInfo;
    Vector<LocalVar> local_vars;
    UpValues upvalues;
};

struct LuaCUP{ //Constants,Upvalues and Protos
    Vector<LuaConstant> constants;
    if (minorVersion == 1 || minorVersion==2) {
        u32 sizep;  // size of proto
        if (sizep > 0) {
            LuaFunction protos[sizep];
        }
    }
    if (minorVersion != 1)
        Vector<LuaUpvalue> upvalues;
    if (minorVersion == 3 || minorVersion == 4)
        Vector<LuaFunction> protos;
}[[inline]];

struct LuaFunction {
    SourceOnFunction source;
    Size line_defined;
    Size last_line_defined;
    if (minorVersion == 1)
        u8 nups;  // number of upvalues
    u8 number_of_parameters;
    u8 is_vararg;
    u8 maxstacksize;
    Vector<u32> code;
    LuaCUP cup; 
    LuaDebugInfo debugInfo;
};

struct LuaFile {
    LuaBinaryHeader header;
    if (minorVersion == 3 || minorVersion == 4)
        u8 sizeupvalues;
    LuaFunction func;
};

LuaFile file @ 0;

paxcut avatar Aug 29 '24 01:08 paxcut

I just checked the sample Lua 5.1 sample that is used in unit tests and the string is preceded by a 64 bit integer holding its size. The change that you propose turns a successful test into a failing one of reading past the end of the file. Make sure to test the changes on actual files and if you think the ones we use are invalid provided valid ones.

image

paxcut avatar Aug 29 '24 02:08 paxcut

I just checked the sample Lua 5.1 sample that is used in unit tests and the string is preceded by a 64 bit integer holding its size. The change that you propose turns a successful test into a failing one of reading past the end of the file. Make sure to test the changes on actual files and if you think the ones we use are invalid provided valid ones.

image

I did test the patterns before making these changes. After looking into how lua defines size_t, the test samples were created from a lua 64 bit compiler, where as I was using a 32 bit compiler. The unified pattern you sent does seem to account for size_t's variable size, so I guess this was an issue specific to the lua 5.1 pattern. I'll update my pull request to account for size_t's variable size.

Guest257351 avatar Aug 29 '24 03:08 Guest257351

The unified pattern you sent does seem to account for size_t's variable size, so I guess this was an issue specific to the lua 5.1 pattern. I'll update my pull request to account for size_t's variable size.

I don't follow. The unified pattern is exactly equivalent to each of the separate versions, or at least , it aims to be. If one of the versions has a problem, then the unified version will have the exact same problem. For example, for string size type it defines:

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else
        u64 size;
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

So both versions 1 and 2 use u64 like the 5.1 and 5.2 versions do. Im looking into getting better samples so we can test things better.

paxcut avatar Aug 29 '24 03:08 paxcut

The unified pattern you sent does seem to account for size_t's variable size, so I guess this was an issue specific to the lua 5.1 pattern. I'll update my pull request to account for size_t's variable size.

I don't follow. The unified pattern is exactly equivalent to each of the separate versions, or at least , it aims to be. If one of the versions has a problem, then the unified version will have the exact same problem. For example, for string size type it defines:

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else
        u64 size;
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

So both versions 1 and 2 use u64 like the 5.1 and 5.2 versions do. Im looking into getting better samples so we can test things better.

Sorry I've made a mistake again.😅The unified version does not account for 32-bit vs 64-bit compilers, and assumes 64-bit. I probably should have read through it more thoroughly.

Guest257351 avatar Aug 29 '24 03:08 Guest257351

Also I'm not sure how I'd easily make either the unified version or the 5.1 version account for this. Ideally LuaString should use a u32 if LuaBinaryHeader.size_of_size_t is 4, and a u64 if it's 8. However I'm not experienced enough with the pattern language to implement this.

Guest257351 avatar Aug 29 '24 03:08 Guest257351

something like this should work: create a global

u8 global_size_of_size_t=4;

we assign a default just in case. Then inside header, right after size_of_size_t is created

global_size_of_size_t = size_of_size_t;

finally on StringSize

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else {
        if (global_size_of_size_t == 4 )
            u32 size;
        else
            u64 size;
    }
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

I am showing the code in unified pattern but it should be the same for 5.1. What about the other versions?

paxcut avatar Aug 29 '24 04:08 paxcut

something like this should work: create a global

u8 global_size_of_size_t=4;

we assign a default just in case. Then inside header, right after size_of_size_t is created

global_size_of_size_t = size_of_size_t;

finally on StringSize

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else {
        if (global_size_of_size_t == 4 )
            u32 size;
        else
            u64 size;
    }
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

I am showing the code in unified pattern but it should be the same for 5.1. What about the other versions?

That could work, I've managed to implement it differently. You can review that and see which works better.

Guest257351 avatar Aug 29 '24 04:08 Guest257351

Why are you only fixing lua 5.1? Are the other Luas also not possibly erroneous? Considering that we have to deal with 4 versions, there should be an attempt to keep them similar to each other to help debug issues that deal with more than just one version, so breaking structures that are kept together in the other versions doesnt seem like a good idea.

The main issue is why don't you think it is a good idea to get rid of the version files and use one instead? That avoids code duplication, reduces code management, makes it more clear what is the same and what isn't across versions and helps showcasing the pattern language ability to deal with special cases among other things. What are the downsides?

paxcut avatar Aug 29 '24 05:08 paxcut

A better solution than both

u8 stringSize  =  std::mem::read_unsigned((minorVersion>2) ? 13 : 8,1);

eliminates the need to copy from the pattern to a global or the need to break the main placed variable into pieces.

paxcut avatar Aug 29 '24 05:08 paxcut

Why are you only fixing lua 5.1? Are the other Luas also not possibly erroneous? Considering that we have to deal with 4 versions, there should be an attempt to keep them similar to each other to help debug issues that deal with more than just one version, so breaking structures that are kept together in the other versions doesnt seem like a good idea.

The main issue is why don't you think it is a good idea to get rid of the version files and use one instead? That avoids code duplication, reduces code management, makes it more clear what is the same and what isn't across versions and helps showcasing the pattern language ability to deal with special cases among other things. What are the downsides?

We should combine the versions, I didn't do that because it's not what the original issue was trying to fix. But yeah absolutely we should. I also have lots of experience with lua 5.1 bytecode, but haven't looked into the other versions much.

Guest257351 avatar Aug 29 '24 05:08 Guest257351