mupdf.js icon indicating copy to clipboard operation
mupdf.js copied to clipboard

Memory Access Out Of Bounds

Open BlackFuffey opened this issue 7 months ago • 10 comments

I wrote this to extract some vector graphics from pdfs. However it seems like on second page, it will throw this error on first function call involving doc

RuntimeError: memory access out of bounds
    at wasm://wasm/028b9f5e:wasm-function[9107]:0x4707c9
    at wasm://wasm/028b9f5e:wasm-function[1930]:0x14f6f8
    at wasm://wasm/028b9f5e:wasm-function[1922]:0x14f585
    at wasm://wasm/028b9f5e:wasm-function[2058]:0x157ba8
    at wasm://wasm/028b9f5e:wasm-function[150]:0xd049
    at <anonymous> (<redacted>/node_modules/.pnpm/[email protected]/node_modules/mupdf/dist/mupdf-wasm.js:705:12)
    at FinalizationRegistry.cleanupSome (<anonymous>)

Here's the code. https://drive.google.com/file/d/1jleleL7AmEEl_igyj--x_9UnQizZ3PsB/view?usp=sharing

BlackFuffey avatar May 02 '25 11:05 BlackFuffey

@BlackFuffey Could you try it with the import as the core MuPDF library - e.g. import * as MuPDF from "mupdf"; It could possibly be that using at the /mupdfjs level causes the error. Please let me know if it makes a difference.

jamie-lemon avatar May 02 '25 12:05 jamie-lemon

@jamie-lemon Changing the import made no difference. Also, after this happens, it seems like every future use of MuPDF will throw a memory out of bound error until the app is restarted.

BlackFuffey avatar May 02 '25 12:05 BlackFuffey

Another question: should i call destroy() on the given arguments inside rendering device functions?

BlackFuffey avatar May 02 '25 12:05 BlackFuffey

a page.destroy() once you've dealt with a page couldn't hurt. :)

jamie-lemon avatar May 02 '25 13:05 jamie-lemon

Im not doing doc.destroy() or page.destroy() here because this is a single middleware inside a pdf processing pipeline and pages may be reused later. At the end there is a cleanup middleware that will destroy every page and then the doc.

BlackFuffey avatar May 02 '25 13:05 BlackFuffey

Hard to understand without seeing the whole context, but isn't the page object just a local reference to the function? In this case I don't think destroying the reference can do any harm.

jamie-lemon avatar May 02 '25 13:05 jamie-lemon

I may be wrong on this but doesn't MuPDF cache the fetched pages and other data unless destroyed or if it's no longer accessible?

BlackFuffey avatar May 02 '25 14:05 BlackFuffey

Update: I moved the bound extraction logic to pymupdf, and now it works perfectly.

BlackFuffey avatar May 02 '25 14:05 BlackFuffey

Great - a hybrid approach! Good luck with the project!

jamie-lemon avatar May 02 '25 16:05 jamie-lemon

Calling destroy should not be necessary (the garbage collector should reclaim the memory automatically as long as you don't hold references to objects that could be freed up).

RuntimeError: memory access out of bounds indicates a more serious error. It may have caused the memory within the WASM context to be so corrupted that you cannot continue using it without further errors. I'll have a look at tracking this down next week.

You could try a "memento" or ASAN build of the mupdf wasm, to see if you can track down where the corruption occurs.

PS. You can feed the raw RGB pixel data from the pixmap directly into a canvas Image, without round-tripping through PNG.

let pixmap = page.toPixmap(mupdf.Matrix.identity, mupdf.ColorSpace.DeviceRGB, true)
let imageData = new ImageData(pixmap.getPixels().slice(), pixmap.getWidth(), pixmap.getHeight())

ccxvii avatar May 03 '25 09:05 ccxvii

Hey @ccxvii, are there perhaps any updates on this? I seem to be getting this on both the core & /mupdfjs levels while trying to create a trace device as per https://mupdf.readthedocs.io/en/latest/cookbook/javascript/advanced.html#trace-device.

martynasgz avatar Jul 01 '25 19:07 martynasgz