Unsupported opcode: <INVALID> (bytecode=A6h) at position 36.
Unsupported opcode: <INVALID> (bytecode=A6h) at position 36.
I am trying to decompile a python 3.12 .pyc file. But it fails for nearly all files at bytecode "A6h". How can I possibly fix that? I wrote python script with python 3.12 and imported opcode to print all opcodes, but it seems like that are not all of them? What am I missing here, how can I fix the decompiling process?
user@Windows-11-Pro:/mnt/c/Users/user/OneDrive/Desktop/oh-data/pycdc$ pycdc item_data_2.pyc
# Source Generated with Decompyle++
# File: item_data_2.pyc (Python 3.12)
Unsupported opcode: <INVALID> (bytecode=A6h) at position 36.
import bindict
# WARNING: Decompyle incomplete
I added it to the pythonb_3_12.cpp now, But now the output look like this when executing: " pycdc item_data_2.pyc". Any ideas?
pycdas item_data_2.pyc outputs the following:
there is opcode 166 in your pyc - it is not legal one, from cpython include/opcode.h: (Python 3.12)
the direct answer here is: fixing <INVALID> from pycdc requires implementing a decompilation strategy in ASTree.cpp for the specific opcode/instruction, which is non-trivial. you can add the opcode to the case statement in ASTree.cpp just to get the tool to be quiet but it often results in incorrect/incomplete python output.
examples of opcodes blocking successful python code generation (from "OH" pycs) include:
- END_FOR
- JUMP_BACKWARD
- JUMP_BACKWARD_NO_INTERRUPT
- COPY
- END_SEND
- CALL_INTRINSIC_1
- CLEANUP_THROW
- DICT_MERGE
- DICT_UPDATE
- MAKE_CELL
- RERAISE
- SEND
- UNPACK_SEQUENCE_LIST
- UNPACK_SEQUENCE_TUPLE
- UNPACK_SEQUENCE_TWO_TUPLE
Since getting an ultra-trivial merge for a PR proved impossible (#511 - nothing more than "testing the waters" here) I forked and stopped trying to work with pycdc devs, based on the title this message is coming from that fork.. that means you're also going to battle the fact that the original repo doesn't have complete opcode maps (pycdas doesn't produce 100% correct results for 3.11 nor 3.12) and you may be asking devs to implement/investigate something they haven't support for yet in the main repo.
for example, according to pycdc main repo "166" is not a valid opcode, but we can see that it is "UNPACK_SEQUENCE_TUPLE" from cpython source code.
#define UNPACK_SEQUENCE_TUPLE 166
you can see the response from @greenozon illustrating the problem you are going to face here.
i'm trying to be kind about this problem. the fact is we have binaries in the wild which contain opcodes which the pycdc project denies exist.
as for the code you're reversing, in most cases the modules containing bindict have no useful code, they contain a bindict and a call out to a native bindict module that i've not been able to locate (possibly is packed inside the 50MB main exe, it doesn't exist anywhere in the pyc's) -- the bindict format is essentially a table similar to NXFNs along with a trailing binary blob (which is not consistent between bindicts, which means it must be contextual.) to illustrate what i mean, consider this pycdas result from another bindict file:
0 RESUME 0
2 LOAD_CONST 0: 0
4 LOAD_CONST 1: None
6 IMPORT_NAME 0: bindict
8 STORE_NAME 0: bindict
10 PUSH_NULL
12 LOAD_NAME 0: bindict
14 LOAD_ATTR 0: bindict
34 LOAD_CONST 2: b'\x01\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00abnormal_item_state\x0c\x00\x00\x00\x00\x01\x00\x00\x01\x96\x05\x02v\x01\x0b\x01\x0f\x17\xfd8\x18\x00\x00\x00\x89\xc0\x95\x12\t\x00'
36 UNPACK_SEQUENCE_TUPLE 1
40 CALL 1
50 STORE_NAME 1: data
52 LOAD_CONST 1: None
54 RETURN_VALUE
you can see this is basically just calling bindict.bindict(...) passing in the constant bytes/string shown in the disasm. this is basically the same in all files containing bindict data.
the approximate py output from pycdc (if it were actually implemented rather than being denied) would look something like this:
# WIP opcode: UNPACK_SEQUENCE_TUPLE (bytecode=A6h) at position 36.
# Source Generated with Decompyle++
# File: abnormal_capture_rate_data.do.pyc (Python 3.12)
import bindict
data = bindict.bindict(b'\x07\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x000\x00\x00\x00?\x00\x00\x00O\x00\x00\x00_\x00\x00\x00h\x00\x00\x00settlement_rate2settlement_rate4settlement_rate3max_capture_nummust_succeed_numsettlement_rate1init_rateG\x01\x00\x00\x00\x00\x02\x02\x01\x01\x06\x03\x04\x05\n\x06\x00\x12\x00"\x02\x12\x02"\x01\x12\x01"\x06"\x03\x01\x04\x01\x05"\x96\x0e*\xfc\xa9\xf1\xd2Mb`?\xfa~j\xbct\x93h?{\x14\xaeG\xe1zt?\xfc\xa9\xf1\xd2MbP?\x04c\xfc\xa9\xf1\xd2MbP?\x96\x0e\x15\x00\x00\x80?\x00\x00\x00\x00\x00\x00\x00\x00ffffff\xe6?\x02\x02\x9a\x99\x99\x99\x99\x99\xe9?\x96\x0e*{\x14\xaeG\xe1z\x84?\xb8\x1e\x85\xebQ\xb8\x8e?\x9a\x99\x99\x99\x99\x99\x99?\xfa~j\xbct\x93h?\x04c{\x14\xaeG\xe1zt?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x03c333333\xe3?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xa9?333333\xb3?\x00\x00\x00>{\x14\xaeG\xe1zt?\x04c\x9a\x99\x99\x99\x99\x99\x99?\x96\x0e*333333\xe3?\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x9a\x99\x99\x99\x99\x99\xc9?\x04c\x9a\x99\x99\x99\x99\x99\xd9?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xc9?333333\xd3?\x00\x00\x00?\x9a\x99\x99\x99\x99\x99\xa9?\x04c\x9a\x99\x99\x99\x99\x99\xb9?f\x0b\x07\x00\x00\x00\x00\x93\x01\x00\x00\x1bc\r4\x97\x01\x00\x006\xc6\x1ah\x8f\x01\x00\x00\xc99\xe5\x97\x85\x01\x00\x00R)(\x9c\x88\x01\x00\x00\xe4\x9c\xf2\xcb\x8b\x01\x00\x00m\x8c5\xd0\x82\x01\x00\x00\x11\x07$\x01\x02Q\x11\x05r\x01\x01\x9f\x01\x11\x03\xc8\x01\x01\x00\xf1\x01\x11\x01\x9e\x02\x00')
anyway, the short answer is resolving the issue requires updating ASTree.cpp (after fixing the incomplete opcode maps.)
@greenozon you might find this of interest:
https://github.com/wilson0x4d/pycdc/blob/wip/bytes/python_3_11.cpp
https://github.com/wilson0x4d/pycdc/blob/wip/bytes/python_3_12.cpp
i see no reason to not have entries for any opcode appearing in official cpython, it actually works against pycdc maintainers and its end-users trying to figure out what to keep and what to remove, and it causes no harm in having entries that cpython's compile(...) would not produce, the mere fact the opcode has representation in cpython source code at any point during the lifetime of a given version/branch is sufficient reason to be including them (IMHO)
i also have ASTree implementation code for a half dozen ops not pushed to my wip branch. would love if i could work with people that understand how to work with the ast stack and frame logic better than i do.
这里的直接答案是:修复 FROM 需要针对特定的操作码/指令实施反编译策略,这并非易事。您可以将操作码添加到 case 语句中,只是为了让工具保持安静,但这通常会导致 Python 输出不正确/不完整。
<INVALID>``pycdc``ASTree.cpp``ASTree.cpp阻止成功生成 python 代码的操作码示例(来自 “OH” pycs)包括:
- END_FOR
- JUMP_BACKWARD
- JUMP_BACKWARD_NO_INTERRUPT
- 复制
- END_SEND
- CALL_INTRINSIC_1
- CLEANUP_THROW
- DICT_MERGE
- DICT_UPDATE
- MAKE_CELL
- 再加注
- 发送
- UNPACK_SEQUENCE_LIST
- UNPACK_SEQUENCE_TUPLE
- UNPACK_SEQUENCE_TWO_TUPLE
由于为 PR 进行极其琐碎的合并被证明是不可能的(#511 - 这里只不过是“试水”),我分叉并停止尝试与 pycdc 开发人员合作,根据这条消息来自那个分叉的标题..这意味着您还将与原始存储库没有完整操作码映射的事实作斗争(pycdas 无法为 3.11 或 3.12 生成 100% 正确的结果),并且您可能会要求开发人员在主存储库中实现/调查他们尚不支持的东西。
例如,根据 Pycdc main repo “166” 不是一个有效的操作码,但我们可以看到它是 cpython 源码中的 “UNPACK_SEQUENCE_TUPLE”。
#define UNPACK_SEQUENCE_TUPLE 166您可以在此处看到 Illustproving the problem you will facing 的响应。
我试图对这个问题保持善意。事实是,我们在野外有二进制文件,其中包含 PycDC 项目否认存在的操作码。
至于你要反转的代码,在大多数情况下,包含的模块没有有用的代码,它们包含一个 bindict 和一个对我无法找到的原生模块的调用(可能打包在 50MB 的主 exe 中,它在 pyc 的任何地方都不存在)——bindict 格式本质上是一个类似于 NXFN 的表以及一个尾随的二进制 blob(这在 bindict 之间不一致, 这意味着它必须与上下文相关。为了说明我的意思,请考虑来自另一个 Bindict 文件的 pycdas 结果:
bindict``bindict0 RESUME 0 2 LOAD_CONST 0: 0 4 LOAD_CONST 1: None 6 IMPORT_NAME 0: bindict 8 STORE_NAME 0: bindict 10 PUSH_NULL 12 LOAD_NAME 0: bindict 14 LOAD_ATTR 0: bindict 34 LOAD_CONST 2: b'\x01\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00abnormal_item_state\x0c\x00\x00\x00\x00\x01\x00\x00\x01\x96\x05\x02v\x01\x0b\x01\x0f\x17\xfd8\x18\x00\x00\x00\x89\xc0\x95\x12\t\x00' 36 UNPACK_SEQUENCE_TUPLE 1 40 CALL 1 50 STORE_NAME 1: data 52 LOAD_CONST 1: None 54 RETURN_VALUE你可以看到,这基本上只是调用 PASS传入 DISASM 中显示的常量 bytes/string。这在包含 Bindict 数据的所有文件中基本相同。
bindict.bindict(...)(如果它实际实现而不是被拒绝)的近似 py 输出将如下所示:
pycdc# WIP opcode: UNPACK_SEQUENCE_TUPLE (bytecode=A6h) at position 36. # Source Generated with Decompyle++ # File: abnormal_capture_rate_data.do.pyc (Python 3.12) import bindict data = bindict.bindict(b'\x07\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x000\x00\x00\x00?\x00\x00\x00O\x00\x00\x00_\x00\x00\x00h\x00\x00\x00settlement_rate2settlement_rate4settlement_rate3max_capture_nummust_succeed_numsettlement_rate1init_rateG\x01\x00\x00\x00\x00\x02\x02\x01\x01\x06\x03\x04\x05\n\x06\x00\x12\x00"\x02\x12\x02"\x01\x12\x01"\x06"\x03\x01\x04\x01\x05"\x96\x0e*\xfc\xa9\xf1\xd2Mb`?\xfa~j\xbct\x93h?{\x14\xaeG\xe1zt?\xfc\xa9\xf1\xd2MbP?\x04c\xfc\xa9\xf1\xd2MbP?\x96\x0e\x15\x00\x00\x80?\x00\x00\x00\x00\x00\x00\x00\x00ffffff\xe6?\x02\x02\x9a\x99\x99\x99\x99\x99\xe9?\x96\x0e*{\x14\xaeG\xe1z\x84?\xb8\x1e\x85\xebQ\xb8\x8e?\x9a\x99\x99\x99\x99\x99\x99?\xfa~j\xbct\x93h?\x04c{\x14\xaeG\xe1zt?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x03c333333\xe3?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xa9?333333\xb3?\x00\x00\x00>{\x14\xaeG\xe1zt?\x04c\x9a\x99\x99\x99\x99\x99\x99?\x96\x0e*333333\xe3?\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x9a\x99\x99\x99\x99\x99\xc9?\x04c\x9a\x99\x99\x99\x99\x99\xd9?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xc9?333333\xd3?\x00\x00\x00?\x9a\x99\x99\x99\x99\x99\xa9?\x04c\x9a\x99\x99\x99\x99\x99\xb9?f\x0b\x07\x00\x00\x00\x00\x93\x01\x00\x00\x1bc\r4\x97\x01\x00\x006\xc6\x1ah\x8f\x01\x00\x00\xc99\xe5\x97\x85\x01\x00\x00R)(\x9c\x88\x01\x00\x00\xe4\x9c\xf2\xcb\x8b\x01\x00\x00m\x8c5\xd0\x82\x01\x00\x00\x11\x07$\x01\x02Q\x11\x05r\x01\x01\x9f\x01\x11\x03\xc8\x01\x01\x00\xf1\x01\x11\x01\x9e\x02\x00')无论如何,简短的回答是解决问题需要更新ASTree.cpp(在修复不完整的操作码映射之后)。
Unsupported opcode: END_FOR (113)那么有什么方法可以获取到他
@jsrcode not clear what you have wrote please use English language
@jsrcode not clear what you have wrote please use English language
I'm getting an error and he tells me Unsupported opcode: END_FOR (113) That is END_FOR this bytecode is not recognized, is there any way I can make this tool recognize him? I want to fix this
the direct answer here is: fixing
<INVALID>frompycdcrequires implementing a decompilation strategy inASTree.cppfor the specific opcode/instruction, which is non-trivial. you can add the opcode to the case statement inASTree.cppjust to get the tool to be quiet but it often results in incorrect/incomplete python output.examples of opcodes blocking successful python code generation (from "OH" pycs) include:
- END_FOR
- JUMP_BACKWARD
- JUMP_BACKWARD_NO_INTERRUPT
- COPY
- END_SEND
- CALL_INTRINSIC_1
- CLEANUP_THROW
- DICT_MERGE
- DICT_UPDATE
- MAKE_CELL
- RERAISE
- SEND
- UNPACK_SEQUENCE_LIST
- UNPACK_SEQUENCE_TUPLE
- UNPACK_SEQUENCE_TWO_TUPLE
Since getting an ultra-trivial merge for a PR proved impossible (#511 - nothing more than "testing the waters" here) I forked and stopped trying to work with pycdc devs, based on the title this message is coming from that fork.. that means you're also going to battle the fact that the original repo doesn't have complete opcode maps (pycdas doesn't produce 100% correct results for 3.11 nor 3.12) and you may be asking devs to implement/investigate something they haven't support for yet in the main repo.
for example, according to pycdc main repo "166" is not a valid opcode, but we can see that it is "UNPACK_SEQUENCE_TUPLE" from cpython source code.
#define UNPACK_SEQUENCE_TUPLE 166 you can see the response from @greenozon illustrating the problem you are going to face here.
i'm trying to be kind about this problem. the fact is we have binaries in the wild which contain opcodes which the pycdc project denies exist.
as for the code you're reversing, in most cases the modules containing
bindicthave no useful code, they contain a bindict and a call out to a nativebindictmodule that i've not been able to locate (possibly is packed inside the 50MB main exe, it doesn't exist anywhere in the pyc's) -- the bindict format is essentially a table similar to NXFNs along with a trailing binary blob (which is not consistent between bindicts, which means it must be contextual.) to illustrate what i mean, consider this pycdas result from another bindict file:0 RESUME 0 2 LOAD_CONST 0: 0 4 LOAD_CONST 1: None 6 IMPORT_NAME 0: bindict 8 STORE_NAME 0: bindict 10 PUSH_NULL 12 LOAD_NAME 0: bindict 14 LOAD_ATTR 0: bindict 34 LOAD_CONST 2: b'\x01\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00abnormal_item_state\x0c\x00\x00\x00\x00\x01\x00\x00\x01\x96\x05\x02v\x01\x0b\x01\x0f\x17\xfd8\x18\x00\x00\x00\x89\xc0\x95\x12\t\x00' 36 UNPACK_SEQUENCE_TUPLE 1 40 CALL 1 50 STORE_NAME 1: data 52 LOAD_CONST 1: None 54 RETURN_VALUEyou can see this is basically just calling
bindict.bindict(...)passing in the constant bytes/string shown in the disasm. this is basically the same in all files containing bindict data.the approximate py output from
pycdc(if it were actually implemented rather than being denied) would look something like this:WIP opcode: UNPACK_SEQUENCE_TUPLE (bytecode=A6h) at position 36.
Source Generated with Decompyle++
File: abnormal_capture_rate_data.do.pyc (Python 3.12)
import bindict data = bindict.bindict(b'\x07\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x000\x00\x00\x00?\x00\x00\x00O\x00\x00\x00_\x00\x00\x00h\x00\x00\x00settlement_rate2settlement_rate4settlement_rate3max_capture_nummust_succeed_numsettlement_rate1init_rateG\x01\x00\x00\x00\x00\x02\x02\x01\x01\x06\x03\x04\x05\n\x06\x00\x12\x00"\x02\x12\x02"\x01\x12\x01"\x06"\x03\x01\x04\x01\x05"\x96\x0e*\xfc\xa9\xf1\xd2Mb`?\xfa~j\xbct\x93h?{\x14\xaeG\xe1zt?\xfc\xa9\xf1\xd2MbP?\x04c\xfc\xa9\xf1\xd2MbP?\x96\x0e\x15\x00\x00\x80?\x00\x00\x00\x00\x00\x00\x00\x00ffffff\xe6?\x02\x02\x9a\x99\x99\x99\x99\x99\xe9?\x96\x0e*{\x14\xaeG\xe1z\x84?\xb8\x1e\x85\xebQ\xb8\x8e?\x9a\x99\x99\x99\x99\x99\x99?\xfa~j\xbct\x93h?\x04c{\x14\xaeG\xe1zt?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x03c333333\xe3?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xa9?333333\xb3?\x00\x00\x00>{\x14\xaeG\xe1zt?\x04c\x9a\x99\x99\x99\x99\x99\x99?\x96\x0e*333333\xe3?\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x9a\x99\x99\x99\x99\x99\xc9?\x04c\x9a\x99\x99\x99\x99\x99\xd9?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xc9?333333\xd3?\x00\x00\x00?\x9a\x99\x99\x99\x99\x99\xa9?\x04c\x9a\x99\x99\x99\x99\x99\xb9?f\x0b\x07\x00\x00\x00\x00\x93\x01\x00\x00\x1bc\r4\x97\x01\x00\x006\xc6\x1ah\x8f\x01\x00\x00\xc99\xe5\x97\x85\x01\x00\x00R)(\x9c\x88\x01\x00\x00\xe4\x9c\xf2\xcb\x8b\x01\x00\x00m\x8c5\xd0\x82\x01\x00\x00\x11\x07$\x01\x02Q\x11\x05r\x01\x01\x9f\x01\x11\x03\xc8\x01\x01\x00\xf1\x01\x11\x01\x9e\x02\x00') anyway, the short answer is resolving the issue requires updating ASTree.cpp (after fixing the incomplete opcode maps.)
The binary part at the end includes three data parts, one is the entry construction part, and a set of unknown data in the middle (they cannot construct any content of the entry). I think it may be some kind of index? The last one is a combination of id plus offset. There is a 4-byte at the end of the string, which points to the beginning of the second data block. The second data block has a single byte or multi-byte data, uses varint to encode the length, and needs to calculate a value that will jump to the last data part The first data part locates the starting position through the offset of the last data part, because there is a part of the content at the beginning of the first data part that is unknown, perhaps related to entry construction. Suppose you have a dictionary, one entry contains five key-value pairs, and one entry contains 10 entries If one entry occupies 8 bytes, this data requires 80 bytes to store the entry. For the ID type that exceeds the representation of a single byte, use varint to encode it, which may be encoded into 2 bytes or 3 bytes. For id values that can be represented by a single byte, direct mapping, for example, if an entry value is 25, the mapping is 19 For strings, they are referenced by index. The parsed string part will be indexed starting from 0. For example, if the index value is 4, you need to get the string represented by the index at position 4. 0 - This is the first string 1-This is the second string 2-This is the third string 3-This is the fourth string 4-This is the fifth string String parsing is the sorting of the previous string areas without rearrangement or confusion. For values with decimals, use floating point storage For values with parentheses, use single-byte +varint encoding to store This is just a simple implementation of the basic content, and some of the processing logic is related to the game. I think this may be different depending on the game. For example, my game has an encoded value that records speed, and a specific operation to decode the original value. I think other games may not use it, but other games may also have special calculations to handle some special values, so if you want to decode some data, you must know these methods. If you do not have the original dictionary as a reference, the parsed data and structure cannot be guaranteed to be completely correct unless you are completely confident.
And I think this problem does not need to be parsed by pycdc, and there is no need to fix it, because this kind of data does not originally belong to pyc data. According to the relevant code instructions of bindict, they are just packaged into pyc. In fact, you only need to remove the pyc related parts, then it is a binary dictionary, and there is no confusion
You need to use a dictionary parser, not pycdc/ #
Here is one of the files I am trying to decompile.
You need to use a dictionary parser, not pycdc