pyhwp icon indicating copy to clipboard operation
pyhwp copied to clipboard

Integration in PyCIRCLean

Open Rafiot opened this issue 7 years ago • 2 comments

I'm one of the core developer of PyCIRCLean and am looking for a way to integrate support of HWP files.

The goal of the library is to search files for active content and figure out if they could be malicious. If you want to know more about, please refer to this page.

Do you think pyhwp can help? Note that I know nothing about this file format, I have no idea what kind of active content I could find in the files, and don't know what to look for.

The other issue I'll have is that PyCIRCLean is python3 only, and pyhwp seems is python2 only. Is there a plan to support python3 in the future?

Thank you very much, Raphaël

Rafiot avatar Nov 09 '17 23:11 Rafiot

Thank you for information and suggestion. Especially USB stick sanitization on Rasberry PI looks really brilliant and interesting. I think it may help many small organizations without proper security supports, if any. It would helpful for users of HWP word processor to have such malware-detection/sanitizing systems like that, so I definitely willing to help such integrations.

pyhwp itself has no malware-detection mechanism and that's out of the scope of the program. Another package on top of pyhwp would be appropriate here, but I have not much informations about what kind of attack vectors on the file format are around out there and what can be done for sanitization. I will look into other sanitizers of PyCIRClean for any hints.

Currently pyhwp has its own interoperability issues to resolve like API stability, and Python 3 support, as you noticed. I will further investigate and improve situations, but I don't expect any real progress soon :) Just py3 support has high priority on my list for now.

Thanks!

mete0r avatar Nov 10 '17 11:11 mete0r

You can find the types of indicators I look at for winoffice documents here: https://github.com/CIRCL/PyCIRCLean/blob/master/filecheck/filecheck.py#L340

And in the same files you will have the other indicators for PDFs, LibreOffice, ...

The idea is to find out everything that is potentially active (flash, javascript, fonts, ....) in the file, or could be used to embed active contents (i.e. OpenAction in PDF). Not sure what that would mean for HWP.

Rafiot avatar Nov 10 '17 19:11 Rafiot