olefile
olefile copied to clipboard
improve olefile to write OLE files
Originally reported by: Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2)
It would be a great improvement if OleFileIO was able to write OLE files, in order to modify or create them.
But this would require a lot of changes in the code. The main issues are the management of free sectors, the use of small vs. large streams, and the use of red-black trees to keep the stream names sorted (see reference documents about the OLE format).
I think the best order to add write features would be:
- [X] 1. write a single FAT sector in the file. => done in v0.32 (commit 54c820c)
- [X] 2. write stream data for an existing stream in FAT, same size. => done in v0.32 (commit ef34c24)
- [x] 3. write a single MiniFAT sector in the file => done in PR #59, merged in olefile 0.45
- [x] 4. write stream data for an existing stream in MiniFAT, same size. => done in PR #59, merged in olefile 0.45
- [ ] 5. write the header
- [ ] 6. write the FAT (and DIFAT), same size
- [ ] 7. write the MiniFAT, same size
- [ ] 8. write the directory, same size
- [ ] 9. rename a stream/storage (no change to red-black tree)
- [ ] 10. release a sector as unused
- [ ] 11. allocate a new sector, extending the OLE file size and/or the FAT/MiniFAT if necessary
- [ ] 12. trim down the file, removing unused sectors at the end
- [ ] 13. write stream data for an existing stream, changing its size
- [ ] 14. delete a stream/storage (requires to update the red-black tree)
- [ ] 15. add a new storage in the directory
- [ ] 16. add a new stream
- [ ] 17. create a new OLE file from scratch
- [ ] Also check #55 for inspiration
Please vote for this issue if you want to give it more priority.
- Bitbucket: https://bitbucket.org/decalage/olefileio_pl/issue/6
Hello @decalage2, Is this hard to implement?
delete a stream/storage (requires to update the red-black tree)
write stream data for an existing stream, changing its size rename a stream/storage (no change to red-black tree) Is this hard by using windows APIs?
@blursight, you can edit OLE files using the Windows API (structured storage), it's not too complicated. The point of olefile is to provide a pure python implementation that does not require Windows.
I've only just come across this great library and saw the last update (2018-01-24 v0.45) mentions being able to write streams of any size. Was curious where along these steps the status is now?
We are now at step 4, I updated the comments.
Thanks for clarifying - was just checking since I was getting the error "write_stream: data must be the same size as the existing stream" and the update mentioned being able to write a stream of any size, so wasn't sure.
OK you're right, I just meant overwriting a stream of any size with data of the same size. :-) olefile did not overwrite streams <4KB, olefile 0.45 does.
would be a great feature to be able to write new sized streams - could use to build CI tools that can write vba project bins to package workbooks and documents!!!
would be really keen to see this implemented
For those who cannot wait for a potential implementation in olefile, another option is to use the pywin32 extensions on Windows (or maybe Linux with WINE). See my answer about this on StackOverflow: https://stackoverflow.com/questions/55008271/python-writing-a-bytestream-to-overwrite-an-existing-microsoft-structured-stora/55031666#55031666
For those who cannot wait for a potential implementation in olefile, another option is to use the pywin32 extensions on Windows (or maybe Linux with WINE). See my answer about this on StackOverflow: https://stackoverflow.com/questions/55008271/python-writing-a-bytestream-to-overwrite-an-existing-microsoft-structured-stora/55031666#55031666
nice idea - worked great on Windows, but couldn't get this working with Linux using WINE :(
I there we can use decompress_stream(compressed_container) in olevba.py to decompress a stream. i wonder is there any interface to compress the vba code in to stream ?
@mkbl126 AFAIK there is no VBA compression function implemented yet in Python, simply because nobody took the time to write it. It is just a matter of taking the Microsoft specs (MS-OVBA) and implementing the algorithm in Python. However, if you can use C# instead of Python, other people have done it. See the EvilClippy, Kavod.Vba.Compression and OpenMCDF projects: https://github.com/outflanknl/EvilClippy https://github.com/rossknudsen/Kavod.Vba.Compression https://github.com/ironfede/openmcdf/
Thanks @decalage2 The c# works fine.
- write stream data for an existing stream, changing its size
Any timeline for implementing this feature?
A VBA compression algorithm has been implemented here: https://github.com/coldfusion39/excel-press
Another thing before VBA macro writing is actually useful, PCode should also be written: https://github.com/bontchev/pcodedmp
Hi @JJK96, thank you for mentioning excel-press, this is a great find. However it's not really relevant here, so I put the link in https://github.com/decalage2/oletools/issues/555 instead.
Wanted to add some details to this that I've discovered when I wrote a writer for OLE files. Renaming an entry has a high chance of requiring a modification to the red black tree. Unfortunately, many Microsoft programs are very strict about the red black tree and will throw a fit if there are any mistakes. If a rename would end up with the items all in the same place then the red-black tree can stay the same, but you MUST check that.
I also know just how much of a task many of these features will be to implement to modify the existing structure rather than simply creating a new one. It's for that exact reason that my own code starts from scratch.
Thanks a lot Destiny, this is very helpful as I am currently looking at how to implement this. I looked at your code recently, this is a great inspîration.
@decalage2
Here's another reference implementation that I've used for writing OLE/CFB that maybe you (or someone) will find helpful.
https://github.com/mdsteele/rust-cfb
Thanks Jonathan, I didn't know this one.