olefile icon indicating copy to clipboard operation
olefile copied to clipboard

improve olefile to write OLE files

Open decalage2 opened this issue 13 years ago • 22 comments

Originally reported by: Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2)


It would be a great improvement if OleFileIO was able to write OLE files, in order to modify or create them.

But this would require a lot of changes in the code. The main issues are the management of free sectors, the use of small vs. large streams, and the use of red-black trees to keep the stream names sorted (see reference documents about the OLE format).

I think the best order to add write features would be:

  • [X] 1. write a single FAT sector in the file. => done in v0.32 (commit 54c820c)
  • [X] 2. write stream data for an existing stream in FAT, same size. => done in v0.32 (commit ef34c24)
  • [x] 3. write a single MiniFAT sector in the file => done in PR #59, merged in olefile 0.45
  • [x] 4. write stream data for an existing stream in MiniFAT, same size. => done in PR #59, merged in olefile 0.45
  • [ ] 5. write the header
  • [ ] 6. write the FAT (and DIFAT), same size
  • [ ] 7. write the MiniFAT, same size
  • [ ] 8. write the directory, same size
  • [ ] 9. rename a stream/storage (no change to red-black tree)
  • [ ] 10. release a sector as unused
  • [ ] 11. allocate a new sector, extending the OLE file size and/or the FAT/MiniFAT if necessary
  • [ ] 12. trim down the file, removing unused sectors at the end
  • [ ] 13. write stream data for an existing stream, changing its size
  • [ ] 14. delete a stream/storage (requires to update the red-black tree)
  • [ ] 15. add a new storage in the directory
  • [ ] 16. add a new stream
  • [ ] 17. create a new OLE file from scratch
  • [ ] Also check #55 for inspiration

Please vote for this issue if you want to give it more priority.


  • Bitbucket: https://bitbucket.org/decalage/olefileio_pl/issue/6

decalage2 avatar Nov 27 '11 09:11 decalage2

Hello @decalage2, Is this hard to implement?

delete a stream/storage (requires to update the red-black tree)

mguerreiro avatar Oct 03 '16 11:10 mguerreiro

write stream data for an existing stream, changing its size rename a stream/storage (no change to red-black tree) Is this hard by using windows APIs?

blursight avatar Dec 06 '17 12:12 blursight

@blursight, you can edit OLE files using the Windows API (structured storage), it's not too complicated. The point of olefile is to provide a pure python implementation that does not require Windows.

decalage2 avatar Dec 06 '17 14:12 decalage2

I've only just come across this great library and saw the last update (2018-01-24 v0.45) mentions being able to write streams of any size. Was curious where along these steps the status is now?

McSpidey avatar Feb 04 '18 12:02 McSpidey

We are now at step 4, I updated the comments.

decalage2 avatar Feb 04 '18 13:02 decalage2

Thanks for clarifying - was just checking since I was getting the error "write_stream: data must be the same size as the existing stream" and the update mentioned being able to write a stream of any size, so wasn't sure.

McSpidey avatar Feb 05 '18 09:02 McSpidey

OK you're right, I just meant overwriting a stream of any size with data of the same size. :-) olefile did not overwrite streams <4KB, olefile 0.45 does.

decalage2 avatar Feb 05 '18 09:02 decalage2

would be a great feature to be able to write new sized streams - could use to build CI tools that can write vba project bins to package workbooks and documents!!!

robjampar avatar Mar 07 '19 15:03 robjampar

would be really keen to see this implemented

PNJenkinson avatar Mar 07 '19 15:03 PNJenkinson

For those who cannot wait for a potential implementation in olefile, another option is to use the pywin32 extensions on Windows (or maybe Linux with WINE). See my answer about this on StackOverflow: https://stackoverflow.com/questions/55008271/python-writing-a-bytestream-to-overwrite-an-existing-microsoft-structured-stora/55031666#55031666

decalage2 avatar Mar 07 '19 19:03 decalage2

For those who cannot wait for a potential implementation in olefile, another option is to use the pywin32 extensions on Windows (or maybe Linux with WINE). See my answer about this on StackOverflow: https://stackoverflow.com/questions/55008271/python-writing-a-bytestream-to-overwrite-an-existing-microsoft-structured-stora/55031666#55031666

nice idea - worked great on Windows, but couldn't get this working with Linux using WINE :(

robjampar avatar Mar 14 '19 15:03 robjampar

I there we can use decompress_stream(compressed_container) in olevba.py to decompress a stream. i wonder is there any interface to compress the vba code in to stream ?

mkbl126 avatar Jul 25 '19 11:07 mkbl126

@mkbl126 AFAIK there is no VBA compression function implemented yet in Python, simply because nobody took the time to write it. It is just a matter of taking the Microsoft specs (MS-OVBA) and implementing the algorithm in Python. However, if you can use C# instead of Python, other people have done it. See the EvilClippy, Kavod.Vba.Compression and OpenMCDF projects: https://github.com/outflanknl/EvilClippy https://github.com/rossknudsen/Kavod.Vba.Compression https://github.com/ironfede/openmcdf/

decalage2 avatar Jul 25 '19 12:07 decalage2

Thanks @decalage2 The c# works fine.

mkbl126 avatar Jul 26 '19 13:07 mkbl126

  1. write stream data for an existing stream, changing its size

Any timeline for implementing this feature?

X3msnake avatar Sep 30 '21 01:09 X3msnake

A VBA compression algorithm has been implemented here: https://github.com/coldfusion39/excel-press

JJK96 avatar Jan 21 '22 13:01 JJK96

Another thing before VBA macro writing is actually useful, PCode should also be written: https://github.com/bontchev/pcodedmp

JJK96 avatar Jan 21 '22 15:01 JJK96

Hi @JJK96, thank you for mentioning excel-press, this is a great find. However it's not really relevant here, so I put the link in https://github.com/decalage2/oletools/issues/555 instead.

decalage2 avatar Jan 22 '22 22:01 decalage2

Wanted to add some details to this that I've discovered when I wrote a writer for OLE files. Renaming an entry has a high chance of requiring a modification to the red black tree. Unfortunately, many Microsoft programs are very strict about the red black tree and will throw a fit if there are any mistakes. If a rename would end up with the items all in the same place then the red-black tree can stay the same, but you MUST check that.

I also know just how much of a task many of these features will be to implement to modify the existing structure rather than simply creating a new one. It's for that exact reason that my own code starts from scratch.

Thanks a lot Destiny, this is very helpful as I am currently looking at how to implement this. I looked at your code recently, this is a great inspîration.

decalage2 avatar Dec 19 '23 22:12 decalage2

@decalage2

Here's another reference implementation that I've used for writing OLE/CFB that maybe you (or someone) will find helpful.

https://github.com/mdsteele/rust-cfb

jtran avatar Jan 05 '24 14:01 jtran

Thanks Jonathan, I didn't know this one.

decalage2 avatar Jan 05 '24 15:01 decalage2