pyarmor icon indicating copy to clipboard operation
pyarmor copied to clipboard

[BUG] Encryption may generate functions that cause Python to crash after 1024 or 2048 calls.

Open irreg opened this issue 1 year ago • 22 comments

Encryption may generate functions that cause Python to crash with a memory access violation (0xc0000005) after 1024 or 2048 calls.

The exe is generated from source code containing about 5000 functions, and this has occurred 4 times in about 200 encryptions. (In other words, a problematic function is generated with low probability, and if we generate it again with the same py file, it will not occur.) The function that causes the problem appears to change randomly each time it is generated.

The command at the time of generation is as follows

  • pyarmor pack -s main.spec -x "--restrict=101 --exclude .venv" main.py

The above command generates a single exe file, but the problem also occurs when the exe file is disassembled back into a .py file and executed

I tried to find out as much as possible about one of the files where the problem occurred. It seemed that the error occurred at the moment of calling a particular encrypted function from a particular encrypted function in another file. I replaced one of the two files with an unencrypted one and the problem no longer occurs. However, I have not been able to get any detailed results because of the encryption.

Is there any possible known or resolved cause? I will add more if I find out anything else.

C:\Users\Admin>pyarmor info
INFO     PyArmor Version 6.8.0
INFO     Python 3.8.9
ERROR    [Errno 2] No such file or directory: '.pyarmor_config'

OS: Windows 10 x64

irreg avatar Sep 21 '23 02:09 irreg

Try to upgrade latest Pyarmor 7.x version: 7.7.4

It's better to provide sample script could reproduce in my side

jondy avatar Sep 21 '23 04:09 jondy

Also try to remove option --restrict=101

jondy avatar Sep 22 '23 12:09 jondy

Thanks for the reply. I have tried what you pointed out.

  • Pyarmor 7.7.4: No change, error still occurs.
  • Remove --restrict=101: No change, error still occurs.

I've created a reproduction code below.

Abstract

This is a procedure that creates 10000 files that may cause problems and then runs and tests them all.

File structure

root |- run.py # Entry point |- main.py* # Script to launch sub-processes from the main process |- sub.py* # Script to call the files these are causing the problem |- mod.py # File that may cause problems after encryption |- cls.py* # Definition loaded from mod.py |- duplicate.py* # Script to copy mod.py to suspects folder 10000 times before encryption

Files marked with * do not require encryption; the problem reproduces without encryption.

File Contents

run.py

import main

if __name__ == '__main__':
    print("start")
    main.run()
    print("end")

main.py

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
import sys
import re


def submit(n):
    import sub
    sub.run(n)


def execute(n):
    with ProcessPoolExecutor(max_workers=1) as e:
        future = e.submit(submit, n)
        future.result()


def run():
    max_workers = 12
    n_list = list(range(10000))
    continue_on_error = False
    for arg in sys.argv[1:]:
        if matched := re.search(re.compile(r"-*worker=(?P<num>\d+)"), arg):
            max_workers = int(matched.group("num"))
        elif matched := re.search(re.compile(r"-*range=(?P<begin>\d+)[,:-](?P<end>\d+)"), arg):
            n_list = n_list[int(matched.group("begin")):int(matched.group("end"))]
        elif matched := re.search(re.compile(r"-*continue-on-error"), arg):
            continue_on_error = True
        else:
            print(arg)
    e = ThreadPoolExecutor(max_workers=max_workers)
    futures = [e.submit(execute, n) for n in n_list]
    try:
        for future in as_completed(futures):
            n = n_list[futures.index(future)]
            try:
                future.result()
            except Exception as ex:
                print(f"{n},failed: {ex}")
                if not continue_on_error:
                    raise
            else:
                print(f"{n},success")
    finally:
        for future in futures:
            future.cancel()
        e.shutdown(wait=False)

sub.py

The reproduction probability will decrease if there is not some complex processing before calling the function in mod.py. A script that can be reproduced with a relatively short effective description is described in sub.py. Updated on November 1 as a faster method was found. If this complex processing is executed at least 8 times at the end, the probability of occurrence seems to be almost the same

import pandas as pd
import importlib


class A:
    def a(self, a, b):
        pass

    def b(self, t):
        pass


def run(n):
    mod = importlib.import_module(f"suspects.mod{n:04}")
    for i in range(1025):
        if i > 1015:
            aaa = [pd.DataFrame({j: [0.0] for j in range(97)}).astype("category") for _ in range(4)]
            for a in aaa:
                for s in "abcdefg":
                    a[s] = ""
            bbb = [a.to_numpy() for a in aaa]
            c = pd.DataFrame({"": [0.0]}).astype("category")
            d = c.to_numpy()
        mod.f(A(), A())
        if i % 100 == 0:
            print(f"{n},{i}")

mod.py

I couldn't find a way to reproduce it in a shorter description.

from cls import C


def f(a, c):
    l = list()
    n = ""
    l.append(n)
    if c is None:
        t = C(C(a=l).x({n: None}))
    else:
        t = c
    a.a(a=True, b=True)
    r = C(a.b(t))
    r.x = l
    return r

cls.py

class C:
    def __init__(self, a):
        pass

duplicate.py

import shutil
import os

os.makedirs("suspects", exist_ok=True)
for i in range(10000):
    shutil.copy(
        r"mod.py",
        rf"suspects\mod{i:04}.py",
    )

Reproduction environment

  • Windows 10/11 x64
  • Python 3.8.9
  • Pyarmor 6.8.0/7.7.4
  • (pandas 2.0.3)

Reproduction procedures

  1. Arrange files as written in the File structure
  2. Execute python duplicate.py
  3. Execute pyarmor obfuscate run.py --recursive
  4. Move to dist folder
  5. Execute python run.py

The following is no longer necessary because sub.py could be made faster (updated November 1). It takes about an hour. Since it takes more than 10 hours to test everything, it is recommended to divide the work among several PCs with the range=x:y option specified.

python run.py range=0:5000

The default setting is to stop when one problematic file is found, but if you want to get all results without stopping, specify the options as follows.

python -u run.py continue-on-error >result.csv

In my environment, I found about 1 or 2 files per 1000 files with errors. The error does not always occur even in problematic files, and some files have only a 1% or less chance of error (for the current sub.py description). The probability of occurrence varies from file to file. Since some files only occur with low probability, it may be necessary to repeat the test hundreds of times to find all problematic files.

irreg avatar Oct 31 '23 05:10 irreg

@jondy Could you please reopen it?

irreg avatar Oct 31 '23 06:10 irreg

Sorry, in test scripts, there is third-party library pandas, generally I only debug Python offical system library, not other libraries.

And pyarmor obfuscated scripts has different frame, for example, sys._getframe(1) will get different frame as original scripts.

And someone else has reported the issues about pandas to use sys._getframe to query some local variables.

So there are 2 solutions, first patch pandas, there are some examples for Pyarmor 8, but it also works for Pyarmor 7 https://pyarmor.readthedocs.io/en/latest/how-to/third-party.html

The other solution is to use --obf-code 0 to obfuscate the related scripts, the default options for other scripts.

jondy avatar Nov 01 '23 01:11 jondy

It seems that sys._getframe is never called in this code. I will continue to investigate.

irreg avatar Nov 01 '23 01:11 irreg

We have investigated the cause. Apparently, it occurs in many cases if the function is calling native code (written in C language etc.). I was able to reproduce it even with ThreadPoolExecutor and sqlite objects in the standard library as shown below.

sub.py ver.2

import importlib
from concurrent.futures import ThreadPoolExecutor

class A:
    def a(self, a, b):
        pass

    def b(self, t):
        pass

def a(c):
    b = ThreadPoolExecutor()
    return ThreadPoolExecutor()


def run(n):
    mod = importlib.import_module(f"suspects.mod{n:04}")
    for i in range(1025):
        if i > 1015:
            ccc = [a(ThreadPoolExecutor()) for _ in range(512)]
        mod.f(A(), A())

sub.py ver.3

import importlib
import sqlite3

class A:
    def a(self, a, b):
        pass

    def b(self, t):
        pass

def a(c):
    b = sqlite3.connect(":memory:")
    return sqlite3.connect(":memory:")


def run(n):
    mod = importlib.import_module(f"suspects.mod{n:04}")
    for i in range(1025):
        if i > 1015:
            ccc = [a(sqlite3.connect(":memory:")) for _ in range(512)]
        mod.f(A(), A())

No error seems to occur when using "--obf-code 0". However, considering the results so far, it is a little difficult because all codes must be set to "--obf-code 0". (If the conditions that cause the problem are met, a remote encrypted file that is completely unrelated to those conditions will cause an error.)

We also checked the behavior with version 8 (trial). It was reproduced when using the old command, but may not be reproduced when using the new command.

old: pyarmor-7 obfuscate run.py --recursive new: pyarmor gen run.py . -r --private

If there is no problem with the new command, that's enough, but I would appreciate it if you could add it and find out what the problem was. Thank you for your response.

irreg avatar Nov 02 '23 07:11 irreg

Although the CPU is different, there appear to be many similarities in the conditions of occurrence and problems with issue #885. It may be the same case. (Confirmed that advanced 2 and restrict 2 options have nothing to do with reproducing the problem)

irreg avatar Nov 07 '23 01:11 irreg

@irreg If Python > 3.6, try it with Pyarmor 8.

jondy avatar Nov 08 '23 00:11 jondy

I have tried it on Pyarmor 8.4.3.

I can reproduce the problem by executing the encrypted file with the following command.

  • pyarmor-7 obfuscate run.py --recursive

I could not reproduce the problem when using the new command as shown below.

  • pyarmor gen run.py . -r --private

However, the new command does not work with Restrict Mode 100+ we were using. Because of this, We are seriously looking for a replacement for this feature.

irreg avatar Nov 08 '23 04:11 irreg

We further investigated the conditions under which the problem occurs in version 7 and found that the problem can be reproduced with any content of the mod.f function as long as the bytecode before encryption is 16, 48, 64, 192 or 448 ... instructions.

Therefore, it will also occur in the following cases

mod.py (48 Instructions)

def f(a, c):
    if None is not None:
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None

irreg avatar Nov 22 '23 02:11 irreg

How about only obfuscate mod.py by --obf-code 0?

First obfuscate all the scripts with normal options, save to dist/ Then obfuscate mod.py by --obf-code 0, save to dist2/ Next copy dist2/mod.py to dist

Thanks for your efforts, but now Pyarmor 8 still has tasks, pyarmor.man, Refine RFT mode, enhancement for BCC mode, after that it's Pyarmor 7 bugs.

The prefer solution is to upgrade Pyarmor to 8+

This issue may be as knonwn issue for Pyarmor 7

jondy avatar Nov 23 '23 02:11 jondy

Thanks for the replies and suggestions. We agree with your priorities. I think it will work perfectly if we encrypt all mod.py with --obf-code 0, as you said. However, it is quite complicated and bottlenecked for us to completely list the files that match the problematic conditions.

In this way, it seems most likely to move to the version 8 license, although there are some challenges in the transition. We are considering adopting version 8.

irreg avatar Nov 23 '23 05:11 irreg

@irreg, Hi. Have you managed to figure out a way to detect builds that will (or are likely to) crash? Also, if you have any new details to share about this bug, I'd be happy to know, as I'm having the same issue.

Thoufak avatar Feb 26 '24 07:02 Thoufak

@Thoufak It is difficult to check if a problem occurs after encryption. Following the example above, it is necessary to actually call all functions 1024 times to find out. Currently, no other method has been found.

Compared to the above, it is still more practical to check if a function meets the conditions for a problem to occur. For a module before encryption, the number of instructions of all callable objects can be checked to some extent as shown in the example below. If you insert an appropriate meaningless statement (e.g., _ = None) in a function that satisfies the condition, you can avoid the problem, although it will take a lot of time and effort. However, the code below does not support decorators, async functions, inner functions, etc., so detection is incomplete. Our team decided to move to pyarmor 8 before resolving these issues, so we have not investigated further. Currently, version 8 (unless you use compatibility mode) has not caused any problems even after hundreds of thousands of encryptions, so statistically, we believe that the above problems have been solved.

import target # Modules you want to check


def check_callable(name, obj):
    if callable(obj) and hasattr(obj, "__code__"):
        inst_len = len(obj.__code__.co_code) // 2
        if inst_len in (16, 48, 64, 192, 448):
            print(f"found: {name}")
        print(inst_len)


def check_class(obj):
    if isinstance(obj, type):
        for inner_name, inner_obj in obj.__dict__.items():
            check_class(inner_obj)
            check_callable(inner_name, inner_obj)


for name, obj in target.__dict__.items():
    check_class(obj)
    check_callable(name, obj)


irreg avatar Feb 26 '24 15:02 irreg

@irreg, thank you very much for the response. I don't think I will switch to pyarmor 8 any time soon. So, it seems that I could write a tool that would check the instructions count across all my codebase and, if It finds something with an unwanted number of them, it can append one meaningless instruction. Quite tedious, but I'm glad there's finally at least some hope for this issue to be gone (I've been living with it for over a year now).

By the way, is this a complete list or there may be more values to be discovered?

if inst_len in (16, 48, 64, 192, 448):

Thoufak avatar Feb 26 '24 17:02 Thoufak

@Thoufak We have not checked all the number of instructions that correspond to the conditions of occurrence. If necessary, the following should be used to find out.

search.bat

python -m venv .venv
call .venv\Scripts\activate.bat
pip install pyarmor==*.*.*
rem echo Change the search range if necessary
for /L %%i in (3,1,512) do (
  python gen_mod.py %%i
  python duplicate.py
  cd test
  rd dist /s /q
  pyarmor obfuscate run.py --recursive
  cd dist
  python run.py > ..\..\%%i_result.txt
  cd ../../
)
pause

gen_mod.py

import sys

i = int(sys.argv[1])

with open("mod.py", "w") as f:
    f.write("def f(a, c):\n")
    if i == 130:
        extend = True
        i -= 2
    else:
        if i > 130:
            i -= 1
        extend = False
    if i <= 3:
        f.write("    pass\n")
        sys.exit()
    i -=2 # Correct the number of instructions due to return statement
    if i < 6:
       if i > 3:
           f.write("    x = None\n")
           i -= 2
       if i == 3:
           f.write("    list()\n")
       else:
           f.write("    x = None\n")
       sys.exit()
    
    f.write("    if None is not None:\n")
    i -= 4

    while i > 0:
        if i == 3:
            f.write("        list()\n")
            break
        f.write("        x = None\n")
        i -= 2
    if extend:
        f.write("    x = None\n")

run.py, main.py, sub.py ver.2, duplicate.py in the above example are required for execution In our results, the number of mod.py replicas was sufficient to detect about 3000, so reducing the number should speed up the process somewhat.

irreg avatar Feb 27 '24 01:02 irreg

@irreg

I just do a test with

  • Windows 7/x86_64
  • Python 3.8.3
  • Pyarmor 7

It broken with this exception:

$ python dist/run.py

...
617,success
620,failed: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
  File "<dist/run.py>", line 3, in <module>
  File "<frozen run>", line 83, in <module>
  File "<frozen main>", line 39, in run
  File "C:\Python38\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<frozen main>", line 14, in execute
  File "C:\Python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Is it same as your case?

jondy avatar Mar 07 '24 23:03 jondy

@jondy

Same error contents. However, only when the number of instructions is 16, it may stop with a different exception

other than 16 instructions

2945,success
2696,failed: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
  File "<run.py>", line 3, in <module>
  File "<frozen run>", line 74, in <module>
  File "<frozen main>", line 38, in run
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\Python\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<frozen main>", line 14, in execute
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 444, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

16 instructions

401,success
400,success
265,failed: '' object has only read-only attributes (assign to ._idle_semaphore)
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Python\Python38\lib\concurrent\futures\process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "<frozen main>", line 8, in submit
  File "<frozen sub>", line 21, in run
  File "<frozen sub>", line 20, in <listcomp>
  File "C:\Python\Python38\lib\concurrent\futures\thread.py", line 148, in __init__
    self._idle_semaphore = threading.Semaphore(0)
TypeError: '' object has only read-only attributes (assign to ._idle_semaphore)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<run.py>", line 3, in <module>
  File "<frozen run>", line 74, in <module>
  File "<frozen main>", line 38, in run
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\Python\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<frozen main>", line 14, in execute
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 444, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
TypeError: '' object has only read-only attributes (assign to ._idle_semaphore)

irreg avatar Mar 08 '24 04:03 irreg

Got it, thanks.

jondy avatar Mar 09 '24 11:03 jondy

I spent almost 2 days on this issue, still not found the reason.

But if using the python option -X dev, it seems the obfuscated script could work. I did this test (using sub.py #ver2, but change 10000 to 1000)

pyarmor obfuscate -r run.py

python3.8 -X dev dist/run.py

Without -X dev, dist/run.py will failed soon (No. 87 failed).

With -X dev, at least it could run 1000 success

And when I try to test plain script with Python 3.9

python3.9 run.py

9,failed: cannot import name 'Lock' from partially initialized module 'multiprocessing.synchronize' (most likely due to a circular import) (/Users/jondy/workspace/pytransform/python/lib/python3.9/multiprocessing/synchronize.py)

It seems it's related to circular import.

And there is no problem for Python 3.10

python3.10 pyarmor/pyarmor.py obfuscate -O dist310 --exclude pyarmor -r run.py
python3.10 dist310/run.py

Now I put it aside until I have new idea, maybe it's only failed in Python 3.8

jondy avatar May 08 '24 15:05 jondy

In python 3.10, the bytecode seems to have changed slightly from previous versions. Therefore, it may be necessary to use a different code to get the desired length of bytecode.

mod.py (16 Instructions for python 3.10 or later)

def f(a, c):
    if None is not None:
        x = None
        x = None
        x = None
        x = None

gen_mod.py (for python3.10 or later)

import sys

i = int(sys.argv[1])

with open("mod.py", "w") as f:
    f.write("def f(a, c):\n")
    if i == 258:
        extend = True
        i -= 2
    else:
        if i > 258:
            i -= 1
        extend = False
    if i <= 3:
        f.write("    pass\n")
        sys.exit()
    i -=2 # Correct the number of instructions due to return statement
    if i < 8:
       if i > 5 :
           f.write("    x = None\n")
           i -= 2
       if i > 3:
           f.write("    x = None\n")
           i -= 2
       if i == 3:
           f.write("    list()\n")
       else:
           f.write("    x = None\n")
       sys.exit()
    
    f.write("    if None is not None:\n")
    i -= 6

    while i > 0:
        if i == 3:
            f.write("        list()\n")
            break
        f.write("        x = None\n")
        i -= 2
    if extend:
        f.write("    x = None\n")
        f.write("    x = None\n")

irreg avatar May 10 '24 14:05 irreg