cadquery icon indicating copy to clipboard operation
cadquery copied to clipboard

Graceful exit on broke step file

Open qbamca opened this issue 1 year ago • 25 comments

We run Flask app that converts step files to stl for further processing. When we were testing the app we noticed that when supplied with manually broken step file, the cadquery library crash gracefully our Flask app. No Error is raised. We tried catching SystemExit with no avail. The app just stops. I've traced cadquery inner works up to this point: image at which the reader.TransferRoot(i + 1) closes our app. I don't know how to trace it further inside OCP, so I can't investigate further.

The problem is that, such files can close our service, we have no means to prevent it and results in service downtimes while it's being restarted. We need to find a way to prevent it from closing the app.

To Reproduce

repro repo It's a flask app. Run main.py and post file BROKE_FILE.STEP

Backtrace

no backtrace

image

Environment

OS: Ubuntu 22.04.2 LTS python 3.10.12 cadquery v2.3.1 cadquery is installed using pip.

Using: Python interpreter

qbamca avatar Feb 09 '24 10:02 qbamca

I tried loading your step directly via CQ and nothing crashes, but I do get an infinite loop it seems. You'll have to raise an issue with OCCT I'm afraid. Is the STEP actually malicious?

adam-urbanczyk avatar Feb 09 '24 17:02 adam-urbanczyk

Strange, OCCT Draw shows "Could not read file BROKEN_FILE.step , abandon" this corresponds to readstat != IFSelect_RetDone. I also tried with C++ and again ReadFile returns IFSelect_RetError (tested on OCCT 7.7.2, 7.8.0).

Edit: Ignore this comment. My mistake in testing (typo). Actually I do get the same IFSelect_RetDone status.

lorenzncode avatar Feb 11 '24 15:02 lorenzncode

Interesting, on win I get an infinite loop with 7.7.1 and a silent crash with 7.7.2.

adam-urbanczyk avatar Feb 11 '24 18:02 adam-urbanczyk

I can reproduce the crash on linux with CQ and 7.7.2.

OCCT Draw does not crash; it does report ERR messages in the output (7.7.2):

Reduced model for translation without additional info will be used 
**** ERR StepFile : Incorrect Syntax : Fails Count : 1 ****
*** ERR StepReaderData : Unresolved Reference : Fails Count : 15 ***

image

The same ERR messages are printed with CQ with:

from OCP.Message import Message, Message_Gravity
for printer in Message.DefaultMessenger_s().Printers():
    printer.SetTraceLevel(Message_Gravity.Message_Info)

lorenzncode avatar Feb 11 '24 22:02 lorenzncode

I wouldn't call STEP file malicious. We did remove a line or two by hand to simulate the broken file, though. The errors @lorenzncode mentioned, we've seen them already with files that were uploaded by users but it was in different environment where silent crashes were not a problem and we didn't notice the problem then. My point is, we suspect we might have to deal with such files in production and files broken in any way, should not result in silent crash.

qbamca avatar Feb 12 '24 09:02 qbamca

@lorenzncode interesting. I checked with draw from 7.5 I do get an infinite loop of sorts. Draw from 7.8 works fine (I don't have 7.7.2 at hand). Looking quickly at the code of OCCT I was not able to tell how to extract the errors programmatically. Would you know?

adam-urbanczyk avatar Feb 13 '24 20:02 adam-urbanczyk

@adam-urbanczyk I found the messages can be extracted using Message_PrinterToReport.

import io
from OCP.Message import Message, Message_Gravity, Message_PrinterToReport
from OCP.STEPControl import STEPControl_Reader

msger = Message.DefaultMessenger_s()
newprinter = Message_PrinterToReport()
report = newprinter.Report()
msger.AddPrinter(newprinter)

reader = STEPControl_Reader()
reader.ReadFile("BROKEN_FILE.STEP")

f = io.BytesIO()
alerts = report.GetAlerts(Message_Gravity.Message_Info)
for alert in alerts:
    alert.Attribute().DumpJson(f)

print(f.getvalue())
f.close()

lorenzncode avatar Feb 14 '24 02:02 lorenzncode

I get the same stacktrace in C++ and Draw with

OSD::SetSignal(OSD_SignalMode_Unset, Standard_False);

No crash with OSD::SetSignal();

In Draw, default is set here: https://github.com/Open-Cascade-SAS/OCCT/blob/cec1ecd0c9f3b3d2572c47035d11949e8dfa85e2/src/Draw/Draw_BasicCommands.cxx#L1004

lorenzncode avatar Feb 16 '24 02:02 lorenzncode

I opened an OCCT issue https://tracker.dev.opencascade.org/view.php?id=33603

@qbamca Can you also share the unmodified STEP file? Probably not useful but could see the diff.

lorenzncode avatar Feb 17 '24 02:02 lorenzncode

@lorenzncode what is the current conclusion regarding CQ? I.e. can we change some settings to mitigate the issue?

adam-urbanczyk avatar Feb 17 '24 11:02 adam-urbanczyk

@adam-urbanczyk Pending a fix in OCCT itself, yes it would be possible to mitigate the issue in CQ by conditionally aborting importStep based on monitoring the Message_Info level alerts.

The simplest way might be to provide an option to abort on any alert (ignoring the contents of the messages). If we find it results in false positives then it might be required to check for specific messages.

In C++ the issue can be mitigated with OSD::SetSignal(). I don't know if that is possible in OCP?

lorenzncode avatar Feb 17 '24 22:02 lorenzncode

Thank you for the report.

dpasukhi avatar Feb 18 '24 10:02 dpasukhi

This kind of issue is not stable. The result is undefined if you will disable catching signals. And there is no specific message information for that kind of access problem. If you have any kind of crash, we fix that fast. But for a long time, it happened rarely. And only on non-correct STEP files.

dpasukhi avatar Feb 18 '24 10:02 dpasukhi

@dpasukhi Thank you for your inputs!

@adam-urbanczyk Since crashes are expected to be rare perhaps CQ can separate the STEPControl_Reader().ReadFile() part of importStep and let the user implement a safer mode if they need it. This would provide the user the opportunity to monitor messages and decide to proceed with reading the step file. I'll share an example.

lorenzncode avatar Feb 18 '24 15:02 lorenzncode

@qbamca Can you also share the unmodified STEP file? Probably not useful but could see the diff.

I've added the original file in the repo. 3007091.STEP We also confirmed that we see same results with files uploaded by users, without any modification on our part so the problem is not artificial.

BTW, thank you all for looking into this.

qbamca avatar Feb 20 '24 14:02 qbamca

can you give me a short status report of this issue? i'm not sure how to proceed.

qbamca avatar Feb 27 '24 12:02 qbamca

Same here. Maybe take a look if #1525 helps you. It seems that there is no one liner that would solve this. You might need to handle such issues on your side using OCCT directly.

adam-urbanczyk avatar Feb 27 '24 12:02 adam-urbanczyk

The ticket is one of the first to fix. I wait for compleating another ticket for the developer. Crash is first priority for maintance activities. I hope we will prepare fix this week or beginning of the next week. Patch will be based on master branch. You will need to cherry pick or rebase after resolving issue.

dpasukhi avatar Feb 27 '24 22:02 dpasukhi

hi, it's been a while. Is there any progress with this issue?

qbamca avatar Jun 03 '24 13:06 qbamca

From OCCT side the issue was resolved a long time ago. (If you found new issues, please let me know)

dpasukhi avatar Jun 03 '24 13:06 dpasukhi

Probably when this lands in the OCCT release and when this release is wrapped and used in CQ. I assume 7.9, will take a while.

adam-urbanczyk avatar Jun 04 '24 19:06 adam-urbanczyk

Changes included into 7.8.1. 7.8.0 and 7.8.1 are easy to replace just by replacing dll or so files. No conflicts.

dpasukhi avatar Jun 04 '24 19:06 dpasukhi

Then stay tuned (#1589)

adam-urbanczyk avatar Jun 04 '24 19:06 adam-urbanczyk

7.7.2 -> 7.8.n is a little more complicated task. There was a reorganization of some things, see https://dev.opencascade.org/doc/overview/html/occt__upgrade.html#upgrade_occt780 But great to know :)

dpasukhi avatar Jun 04 '24 19:06 dpasukhi

thank you guys!

qbamca avatar Jun 11 '24 08:06 qbamca