spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Fatal Python error: Aborted when deleting span from SpanGroup

Open NixBiks opened this issue 1 year ago • 5 comments

How to reproduce the behaviour

Unfortunately I haven't been able to make a minimal reproducible example. The error happens sometimes when I run my tests - the more tests I run the higher probability of running into the following critical error

Python(69256,0x1fa11cf40) malloc: Region cookie corrupted for region 0x12f800000 (value is c)[0x12f80407c]
Python(69256,0x1fa11cf40) malloc: *** set a breakpoint in malloc_error_break to debug
Fatal Python error: Aborted

I've found out that I don't encounter the error if I remove the following line.

del span.doc.spans["my_spans"][index]

The documentation states this is the way to do it though. Note there is a typo in the code in the docs though.

I realise you most likely can't help with this little info but maybe you can give me some ways to investigate further.

Info about spaCy

  • spaCy version: 3.7.4
  • Platform: macOS-14.6.1-arm64-arm-64bit
  • Python version: 3.12.5

NixBiks avatar Sep 27 '24 12:09 NixBiks

My current workaround is to create a custom span attribute and filter on that.

Span.set_extension("is_removed", default=False)

Instead of removing the object with del I simply change that custom attribute span.doc.spans["my_spans"][index]._.is_removed = True

NixBiks avatar Sep 30 '24 06:09 NixBiks

Thanks for the report. I'll have a look at this.

honnibal avatar Oct 02 '24 09:10 honnibal

Is it possible you're removing an element during iteration? From the implementation this wouldn't be expected to work: https://github.com/explosion/spaCy/blob/master/spacy/tokens/span_group.pyx#L113

honnibal avatar Oct 02 '24 10:10 honnibal

It's inside some iteration but I'm not iterating over the spans in the SpanGroup no. I've iterated earlier to find the index that needs to be removed though. I hope it makes sense.

NixBiks avatar Oct 02 '24 10:10 NixBiks

I think it makes sense yeah. The pattern you want to avoid is something like:

for span in span_group:
    if check_remove(span):
        del span_group[span]

Deleting during iteration like that won't work, and of course precalculating an index and not adjusting it after you delete will cause problems too. But the latter one shouldn't cause an out-of-bounds access. Unless there's a bug in how we check the indexing, it should only be possible to get the out-of-bounds access from within the class's C methods.

These are the tests we have for the __del__ method, which look okay to me: https://github.com/explosion/spaCy/blob/master/spacy/tests/doc/test_span_group.py#L191

Usually if we have a memory access problem in the tests we do get a crash, because we've got a lot of tests that run in various orders. So it must be some sort of access pattern that's not working. Are you able to share any code?

honnibal avatar Oct 02 '24 10:10 honnibal