cuckoo-modified process.py locking up on certain pdfs

hi,

we seem to have a 'lucky stream' of corrupted PCAP extracts where PDFs cause the processing to hang indefinitely...(normally executable or other PDFs submit, process and report ok) While I'll try to post a PCAP, that might not be possible (can Brad message me privately re that?) .

Anyhow, we're running processing tasks not within the main cuckoo process, but via the utils/process.py helper.

e.g. /utils/process.py -d -p 7 auto

eventually enough of those PDFs queue up that processing halts completely as all process processes hang if enough corrupted PDFs get submitted. 2016-02-18 23:03:57,947 [modules.processing.static] DEBUG: Starting to load PDF 2016-02-18 23:04:00,365 [modules.processing.static] DEBUG: About to parse with PDFParser 2016-02-18 23:04:03,622 [modules.processing.static] DEBUG: About to parse with PDFParser 2016-02-18 23:04:06,989 [modules.processing.static] DEBUG: About to parse with PDFParser 2016-02-18 23:04:07,711 [modules.processing.static] DEBUG: About to parse with PDFParser 2016-02-18 23:04:18,115 [modules.processing.static] DEBUG: About to parse with PDFParser 2016-02-18 23:04:18,923 [modules.processing.static] DEBUG: About to parse with PDFParser 2016-02-18 23:04:19,794 [modules.processing.static] DEBUG: About to parse with PDFParser ^and then nothing will happen , and without extra debug statements in static.py after instantiating pdf parser and trying to see if it freezes there or on result processing not too sure. Suspect in PDFParser/peepdf PDFCore

occasionally it seems to restart processing (probably if i kick it over )and then the same set of tasks freezes again

as far as I can tell static analysis happens via peepdf (that's a bit newer in their repo by the way ? (using that still makes processing freeze ). While on the subject - did they merge the google summer of code pdf malscore changes and did we pull them into csb or brad-csb?)

*So anyhow , was hoping there was a quick workaround someone (who knows a bit more python and about safely killing processing tasks and marking them as processing or reporting failed) can help with, for example a watchdog timer in process.py that we get an extra config setting for. and we see processing taking over 600 seconds that task gets marked as failed and the helper process for that task ID terminates itself, preferably safely. *

thought/please help/questions? Mb.

Feb 22 '16 10:02 mallorybobalice

ps I was eyeing the CSB2 process.py but it was sufficiently different for me to give up on that idea (just replacing the spender-sandbox ones doesn't work)

Feb 22 '16 10:02 mallorybobalice

We see this too, extremely frequently. Enough that I wrapped it in an InterruptingCow (https://pypi.python.org/pypi/interruptingcow) try/catch to allow the rest of the processing module to proceed.

Feb 22 '16 22:02 Fryyyyy

Hmm the workaround tip is much appreciated . It looks like this way I don't necessarily have to fail the task.

Do you up wrap the self parse here? Or even more outside and somehow indicate part of static or all of it failed? (Would you mind sharing?)

log.debug("Starting to load PDF") results = self._parse(self.file_path) return results

I was thinking of wrapping the results bit but might be too deep and things above might not like partially populated results

Ala

from interruptingcow import timeout

log.debug("Starting to load PDF") try: with timeout(40, exception=RuntimeError): results = self._parse(self.file_path)

except RuntimeError: log.error("pdf analysis failed, task I'd blaa. Name bla"

return results

While on the subject of pdfs, @fryyyy - do you also often get corrupted pdfs locking up reader on opening in the guest VM and producing large bson logs that consume a lot of memory on csb host processing? Any thoughts on that (I just increased RAM in line with number of parallel tasks, and reduced Max log size, but that doesn't make me feel fuzzy)

Feb 23 '16 02:02 mallorybobalice

Unfortunately I'm not permitted to share code, though I am working on getting permission. I haven't noticed lockups or large logs, but then again we have a lot of hosts with a lot of guests doing a lot of analyses so it's not easy to keep an eye on individual submissions.

Feb 23 '16 02:02 Fryyyyy

I see. No worries and thanks again

Feb 23 '16 10:02 mallorybobalice

If someone gives me a hash for one of these PDFs, I can fix the actual problem instead of working around it.

-Brad

Feb 23 '16 12:02 spender-sandbox

Thanks Brad, I think fryyyy and I assumed or Fry may have checked it's upstream in peepdf. (I suppose checking is one break or print statement away.) Hence looking for workarounds

I'll have a look tomorrow and if already on malwr or ha shared, will supply hash. if it's a public unrelated to employer happy to share. I can probably try to get permission to share a public related one but it'd have to go out of band for an archive password. Is the latter an option? Would also like to try to share the huge reader freezing bson Pdf ones, maybe those operations can be coalesced or ... idk, haven't really looked closely at it

Feb 23 '16 12:02 mallorybobalice

Ps are you sure the pdf specific issue aside there shouldn't be a timeout processing or per processing module or signature and reporting feature (or not worth it /normally works?) ? I'd rather find out from syslog alerts saying warning task processing failed then find all tasks hung next time (from an ops monitoring perspective bounded run times feel all warm and fuzzy inside )

Feb 23 '16 13:02 mallorybobalice

I've modified static.py to dump the PDFs out to a temp directory when it triggers the timeout. Once we've got a few, I'll see if I can get some cleared for release or at least do some analysis on them to see if there's anything similar about them.

Feb 23 '16 23:02 Fryyyyy

3eaef2ca2c9d29e936919c7c6f8e5614aef6edf8cec6c92008291bafea0388d0 took more than a minute to statically analyse, which hits our timeout.

Edit for additional samples:

1e3db20bb77178cabe8e32a47510a027bb38bc585ed02a95052e3965ac9a9b26 2c2c956d74dcc245655a6c56aa052212ac1a933e22e5a41d63afc1aa9d2eccf5 3eaef2ca2c9d29e936919c7c6f8e5614aef6edf8cec6c92008291bafea0388d0 63e0063d43ae9578c328b4683b53c868497dd41c8c112f3365308907ad444a84

Feb 23 '16 23:02 Fryyyyy

i seemed to stumble upon one of these every once inna while. Here's the latest one where i turned up as much debug as I could. The jserror log does not have any time stamps though. If you need the PDF just let me know.

SHA256 2ab11d83ae2cbd12f0f6c30aacad8a8e16df5255646d08e923054b9f521c4b83 debug_log_pdf.txt

Feb 24 '16 14:02 housemusic42

ps if anyone's interested in the specifics of what Fryyy was suggesting , what we've done for now as a workaround is wrap the whole static analysis case statement in interrupting cow. (appreciate the Tip Fry). Could optionally copy these off, but for now we just print task id info if required later.

in case anyone wants it:

pip install interruptingcow

vi modules/processing/static.py

add new import: from interruptingcow import timeout

replace Static(Processing): with a wrapped version.

class Static(Processing):
    """Static analysis."""

    def run(self):
        """Run analysis.
        @return: results dict.
        """
        self.key = "static"
        static = {}
        TOSeconds=60 #you can replace this with a config value
        try:
            with timeout(TOSeconds, exception=RuntimeError):

                if self.task["category"] == "file":
                    thetype = File(self.file_path).get_type()
                    if HAVE_PEFILE and ("PE32" in thetype or "MS-DOS executable" in thetype):
                        static = PortableExecutable(self.file_path, self.results).run()
                        if static and "Mono" in thetype:
                            static.update(DotNETExecutable(self.file_path, self.results).run())
                    elif "PDF" in thetype or self.task["target"].endswith(".pdf"):
                        static = PDF(self.file_path).run()
                    elif "Word 2007" in thetype or "Excel 2007" in thetype or "PowerPoint 2007" in thetype or "MIME entity" in thetype:
                        static = Office(self.file_path).run()
                    elif "Composite Document File" in thetype:
                        static = Office(self.file_path).run()
                    elif self.task["target"].endswith((".doc", ".docx", ".rtf", ".xls", ".mht", ".mso", ".xlsx", ".ppt", ".pptx", ".pps", ".ppsx", ".pptm", ".potm", 

".potx", ".ppsm")):
                        static = Office(self.file_path).run()
                    elif "Java Jar" in thetype or self.task["target"].endswith(".jar"):
                        decomp_jar = self.options.get("procyon_path", None)
                        if decomp_jar and not os.path.exists(decomp_jar):
                            log.error("procyon_path specified in processing.conf but the file does not exist.")
                        static = Java(self.file_path, decomp_jar).run()
                    # It's possible to fool libmagic into thinking our 2007+ file is a
                    # zip. So until we have static analysis for zip files, we can use
                    # oleid to fail us out silently, yeilding no static analysis
                    # results for actual zip files.
                    elif "Zip archive data, at least v2.0" in thetype:
                        static = Office(self.file_path).run()
                elif self.task["category"] == "url":
                    enabled_whois = self.options.get("whois", True)
                    if HAVE_WHOIS and enabled_whois:
                        static = URL(self.task["target"]).run()
        except RuntimeError:
                log.error("Error performing static analysis for task %d, within %d s, type: %s, file: %s",self.task["id"],TOSeconds,thetype,self.file_path)
        return static

Mar 01 '16 10:03 mallorybobalice

here's a PDF that I can consistently get PDFParser to stall out on. let me know if you need more information.

ed2732bce8351b839e924a0cf5512ce90fcdfd4274796824bab40eb2d1850ff0.pdf

Mar 18 '16 13:03 housemusic42

garrr.. i'm still having problems with this. @mallorybobalice - i tried your wrapper but it doesn't seem to send an interrupt? any help with this would be great - thinking about just restarting the processing script every hour or slapping something together to regenerate unprocessed tasks.

Aug 26 '16 04:08 housemusic42

Hmm processing log please.(or cuckoo if processing is not done by I utils) Mine interrupts ok. You sure you definitely installed ic and added the import aside from the wrapper? Try setting it to 20s. It would be good if Brad or someone merge it in? getting a bit old merging it myself each git update pull . Has anyone bothered to report upstream to peepdf? I haven't: (

Thing is restarting processing won't help if it tries the same file again. Interrupting partially fails static analysis but continues