bip_utils icon indicating copy to clipboard operation
bip_utils copied to clipboard

Undesired access to file system by bip_utils

Open Anynomouss opened this issue 1 year ago • 3 comments

I have a very strange issue with bip_utils, which I think I do not remember having in the past. When running bip_utils on Windows Subsystems for Linux (Ubuntu20.04, also Ubuntu 22.04, both systems wide and clean virtual Python environments), bip-utils "Unlock drive" pop-ups from bitclocker, meaning that bip_utils or some component thereof is trying to access the filesystem. This is strange behavior and because there is a slight chance it is an exploit I am reporting it.

It is also mighty irritating since I am running Python scripts that utilize bip-utils in parellel triggering huge amounts of pop ups and notifications from Bitlocker, so many that I run out of RAM because of the number of pop-ups. Below is some example code, I narrowed it down to lines of code that trigger this strange behavior.


import binascii # for conversion between Hexa and bytes
from bip_utils import (P2PKHAddrEncoder, Bip32Slip10Secp256k1, Bip44, Bip49, Bip84, Bip86, Bip44Coins,Bip49Coins, Bip84Coins, Bip86Coins, Bip44Changes, Bip38Decrypter, Bip38Encrypter, CoinsConf,
                       ElectrumV1WordsNum, ElectrumV1MnemonicGenerator, ElectrumV1SeedGenerator, ElectrumV1, ## Electrum V1 dependencies only
                       ElectrumV2WordsNum, ElectrumV2MnemonicTypes, ElectrumV2MnemonicGenerator, ElectrumV2SeedGenerator, ElectrumV2Standard, ## Electrum V2 dependencies only
                       IPrivateKey, WifPubKeyModes, WifEncoder,WifDecoder,Bip32KeyData,Bip32KeyDeserializer)

from pybip39 import Mnemonic, Seed 
import csv
import os
import sys


mnemonics = sys.stdin.readlines() 

csvwriter = csv.writer(sys.stdout, delimiter=' ',lineterminator='\n') #os.linesep
mnemonic = Mnemonic() # This is slow =, so do only ones
for words in mnemonics:
    words = words.strip()
    try:
        #seed_bytes = mnemo.to_seed(words)
        mnemonic.validate(words)
        seed = Seed(mnemonic.from_phrase(words), "")
        seed_bytes = bytes(seed)
    except:
        continue
    #csvwriter.writerow([words])
    
    ## Any of the lines below trigger these pop ups meaning there is an attempt to access the file system
    bip32_ctx_m = Bip32Slip10Secp256k1.FromSeedAndPath(seed_bytes, 'm') # Derive at master level
    bip49_mst_ctx = Bip49.FromSeed(seed_bytes, Bip49Coins.BITCOIN)
    bip86_mst_ctx = Bip86.FromSeed(seed_bytes, Bip86Coins.BITCOIN)

The input of the test script is many lines with on each line a single mnemonic with words separated by pace. Save the script above as test_parallel_error.py and run it on any Windows Subsystems for Linux command shell to reproduce this behavior:

printf "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about" {1..800000} | parallel --pipe -j 8--blocksize 10000 --spreadstdin python test_parallel_error.py

It should trigger these popups as long as you have at least one drive connected that is locked and encrypted with Bitlocker. Bitlocker popups are however only the symptom, the real question is why any part of bip utils is trying to get access to the file system in the first place.

Anynomouss avatar Oct 02 '24 09:10 Anynomouss

Hi, the library accesses the file system only to read the mnemonic words lists (BIP39, Electrum, etc...). A mnemonic file is only loaded once, i.e. the first time it is needed (e.g. when creating a mnemonic class), and then kept into memory. The mnemonic files are deployed together with the library. You can find the paths in the source code, for example:

class Bip39MnemonicConst:
   ...

    # Language files
    LANGUAGE_FILES: Dict[MnemonicLanguages, str] = {
        Bip39Languages.ENGLISH: "wordlist/english.txt",
        Bip39Languages.ITALIAN: "wordlist/italian.txt",
        Bip39Languages.FRENCH: "wordlist/french.txt",
        Bip39Languages.SPANISH: "wordlist/spanish.txt",
        Bip39Languages.PORTUGUESE: "wordlist/portuguese.txt",
        Bip39Languages.CZECH: "wordlist/czech.txt",
        Bip39Languages.CHINESE_SIMPLIFIED: "wordlist/chinese_simplified.txt",
        Bip39Languages.CHINESE_TRADITIONAL: "wordlist/chinese_traditional.txt",
        Bip39Languages.KOREAN: "wordlist/korean.txt",
    }

This is the only function accessing files as you can verify from the source code:

class MnemonicWordsListFileReader:
    @staticmethod
    def LoadFile(file_path: str,
                 words_num: int) -> MnemonicWordsList:
        # Read file
        with open(file_path, "r", encoding="utf-8") as fin:
            words_list = [word.strip()
                          for word in fin.readlines()
                          if word.strip() != "" and not word.startswith("#")]

        # Check words list count
        if len(words_list) != words_num:
            raise ValueError(f"Number of loaded words list ({len(words_list)}) is not valid")

        return MnemonicWordsList(words_list)

But it's always been like this since the very very beginning, this is nothing new.

ebellocchia avatar Oct 02 '24 20:10 ebellocchia

I added a print in the function that loads the mnemonic files and tried your code snippet, there is no file access as I expected (since you are not using any class for generating mnemonics or seeds). It could also be that some dependencies are accessing the file system for some reasons, maybe you can try downgrading versions to check if it's the problem.

ebellocchia avatar Oct 02 '24 20:10 ebellocchia

I will do some more testing when I have time and will let you know if I can trace the root cause.

Anynomouss avatar Oct 04 '24 13:10 Anynomouss

Hi, did you find something?

ebellocchia avatar Nov 19 '24 10:11 ebellocchia

Sorry, I got distracted. I had some plans to test it with some new clean virtual python environments to trace down the cause. Also noteworthy to mention I use the code in combination with the bash Gnu parallel package. https://www.gnu.org/software/parallel/ I will see if can do some test this week to see if the issue can be closed or requires further investigation.

Anynomouss avatar Nov 19 '24 15:11 Anynomouss