msoffcrypto-tool
msoffcrypto-tool copied to clipboard
Capability to return a BytesIO/filelike even if it isn't encrypted?
Hello,
I understand that this is potentially out of scope for the project, but considering the existence of OfficeFile.is_encrypted()
I feel this would tie its usage up nicely.
I'll explain a use case via example:
I am using this to load up a set of usually-encrypted Excel files into pandas
, this is great, except a handful of these Excel files have randomly have not been password protected. I don't actually care whether or not they have a password, I just want to put them all into dataframes.
Right now, the argument I pass to pandas.read_excel()
is either a non-protected Excel file's Path
, or a BytesIO
objected retrieved using this library.
This is fine but it has resulted in this messy function:
def decrypt_office_file(file: Path, password: str = None) -> Union[io.BytesIO, Path]:
decrypted_file = io.BytesIO()
with open(file, 'rb') as f:
office_file = msoffcrypto.OfficeFile(f)
if office_file.is_encrypted():
office_file.load_key(password=password)
office_file.decrypt(decrypted_file)
else:
decrypted_file = file
return decrypted_file
excel_file = decrypt_office_file("my_file.xlsx")
df = pd.read_excel(excel_file, ...)
And then I just have to hope everything downstream is cool with taking either a BytesIO
or a str
/Path
, which is okay for pandas but I imagine is less okay for other libraries/use cases.
I'm not sure how it would be best to insert the functionality, but something like OfficeFile.to_bytes()
(I'm sure there are better ideas for function names available) would be great, then we can have consistent return types.
I also find it really odd that .decrypt()
takes the object you want to inject the file into as an argument, rather than returning a BytesIO object? It makes following the code flow feel awkward to me, but that's an issue for another day!
Thanks for your suggestion!
I think adding a utility function (as a context manager) similar to your decrypt_office_file()
to the library is a good idea and will do it.
In the case of your code example, I might replace decrypted_file = file
with decrypted_file = BytesIO(f)
so that the function always returns BytesIO.
I also find it really odd that
.decrypt()
takes the object you want to inject the file into as an argument, rather than returning a BytesIO object? It makes following the code flow feel awkward to me, but that's a for another day!
I understand. The problem is that always creating a BytesIO object can consume unnecessary memory, especially if the document is large in file size.
Thanks for your suggestion! I think adding a utility function (as a context manager) similar to your
decrypt_office_file()
to the library is a good idea and will do it. In the case of your code example, I might replacedecrypted_file = file
withdecrypted_file = BytesIO(f)
so that the function always returns BytesIO.
Brilliant, I just threw that together to get what I needed, I'm sure you'll make it better 😄
I understand. The problem is that always creating a BytesIO object can consume unnecessary memory, especially if the document is large in file size.
~ snip ~ I wrote a load of nonsense, I'm sure you know what you're doing! Plus, this way can support alternatives to just BytesIO which is nice, I guess you could with open as f
and pass f
as ofile
!
Thanks for your suggestion! I think adding a utility function (as a context manager) similar to your
decrypt_office_file()
to the library is a good idea and will do it. In the case of your code example, I might replacedecrypted_file = file
withdecrypted_file = BytesIO(f)
so that the function always returns BytesIO.Brilliant, I just threw that together to get what I needed, I'm sure you'll make it better 😄
I understand. The problem is that always creating a BytesIO object can consume unnecessary memory, especially if the document is large in file size.
~ snip ~ I wrote a load of nonsense, I'm sure you know what you're doing! Plus, this way can support alternatives to just BytesIO which is nice, I guess you could
with open as f
and passf
asofile
!
There will probably be a lot of people like me who forgot to f.seek(0)
😂.
with open(file_path, 'rb') as f:
file = msoffcrypto.OfficeFile(f)
if file.is_encrypted():
file.load_key(password=password) # Use password
file.decrypt(decrypted_io)
else:
f.seek(0) # don't forget this
decrypted_io.write(f.read())
return decrypted_io