PersonalAnalytics icon indicating copy to clipboard operation
PersonalAnalytics copied to clipboard

Test and create Guide for Reading encrypted sqlite file in Python

Open casaout opened this issue 11 months ago • 1 comments

Note that work in progress is described below and in this branch.

casaout avatar Mar 18 '24 07:03 casaout

As briefly discussed yesterday, we think that it's better to not work with encrypted sqlite-databases to avoid making it too complicated for other researchers for their data analysis. Hence, we decided to create an encrypted zip-file.

I've started testing how reading an encrypted zip-file with a non-encrypted sqlite database works in Python, but am experiencing character encoding issues as summarized below. For time reasons, I'd like to ask you @SRichner to try and see if you can make it work.

What I tried:

  • extracting then opening the file (in explorer) works
  • read log/txt file from encrypted zip file works
  • read sqlite file from encrypted zip file results in charset issues (see code below)

Thoughts:

  • I tried with two charsets, both throwing different errors. Which charset is used for creating the sqlite file in PA through TS/node?
  • Maybe we should first create a zip-file via node/ts, just to ensure that's the exactly encrypted zip-file we're going to write the code for
  • Maybe there is also another way to read the sqlite db other than in-memory, but this might not matter for the problem at hand
import sqlite3
import pyzipper
import io

subject = "32XSB1"
zipPath = "C:\\TEMP\\PA\\PA_32XSB1.encrypted.zip"
pwd = "PersonalAnalytics_32XSB1"
dbfile = "database.sqlite"
tableName = "user_input"

charset = "iso-8859-1" # error: 'near "SQLite": syntax error'
# charset = "utf-8" # error: 'utf-8' codec can't decode byte 0xe7 in position 27: invalid continuation byte

with pyzipper.AESZipFile(zipPath) as zf:
    # try 1: extracting then opening the file (in explorer) works
    # zf.extractall(path='C:\\TEMP\\PA\\PA_32XSB1_decrypted', pwd = bytes(pwd, 'utf-8'))
    # print(zf.infolist())

    # try 2: read log/txt file from encrypted zip file works

    # try 3: read sqlite file from encrypted zip file results in charset issues
    for file_info in zf.infolist():
            print(file_info)
            if file_info.filename == dbfile:
                with zf.open(file_info, pwd=bytes(pwd, charset)) as sqlite_file:
                     
                    sqlite_data = sqlite_file.read()
                    conn = sqlite3.connect(':memory:')
                    cursor = conn.cursor()
                    sqlite_data_str = sqlite_data.decode(charset)

                    # print(sqlite_data_str) # printing the string actually shows decrypted content

                    # bug: 'near "SQLite": syntax error'
                    cursor.executescript(sqlite_data_str) # conn.executescript(sqlite_data_str) # same issue

                    # Query to list all tables in the database
                    cursor.execute("SELECT name FROM external.sqlite_master WHERE type='table';")

                    # Fetch and print the table names
                    tables = cursor.fetchall()
                    for table in tables:
                        print(f"Table: {table[0]}")
                        cursor.execute(f"PRAGMA table_info(external.{table[0]});")
                        columns = cursor.fetchall()
                        print("Columns:")
                        for column in columns:
                            print(column)

                    conn.close()

casaout avatar Mar 28 '24 08:03 casaout