Memory Access Out Of Bounds
I wrote this to extract some vector graphics from pdfs. However it seems like on second page, it will throw this error on first function call involving doc
RuntimeError: memory access out of bounds
at wasm://wasm/028b9f5e:wasm-function[9107]:0x4707c9
at wasm://wasm/028b9f5e:wasm-function[1930]:0x14f6f8
at wasm://wasm/028b9f5e:wasm-function[1922]:0x14f585
at wasm://wasm/028b9f5e:wasm-function[2058]:0x157ba8
at wasm://wasm/028b9f5e:wasm-function[150]:0xd049
at <anonymous> (<redacted>/node_modules/.pnpm/[email protected]/node_modules/mupdf/dist/mupdf-wasm.js:705:12)
at FinalizationRegistry.cleanupSome (<anonymous>)
Here's the code. https://drive.google.com/file/d/1jleleL7AmEEl_igyj--x_9UnQizZ3PsB/view?usp=sharing
@BlackFuffey Could you try it with the import as the core MuPDF library - e.g.
import * as MuPDF from "mupdf";
It could possibly be that using at the /mupdfjs level causes the error.
Please let me know if it makes a difference.
@jamie-lemon Changing the import made no difference. Also, after this happens, it seems like every future use of MuPDF will throw a memory out of bound error until the app is restarted.
Another question: should i call destroy() on the given arguments inside rendering device functions?
a page.destroy() once you've dealt with a page couldn't hurt. :)
Im not doing doc.destroy() or page.destroy() here because this is a single middleware inside a pdf processing pipeline and pages may be reused later. At the end there is a cleanup middleware that will destroy every page and then the doc.
Hard to understand without seeing the whole context, but isn't the page object just a local reference to the function? In this case I don't think destroying the reference can do any harm.
I may be wrong on this but doesn't MuPDF cache the fetched pages and other data unless destroyed or if it's no longer accessible?
Update: I moved the bound extraction logic to pymupdf, and now it works perfectly.
Great - a hybrid approach! Good luck with the project!
Calling destroy should not be necessary (the garbage collector should reclaim the memory automatically as long as you don't hold references to objects that could be freed up).
RuntimeError: memory access out of bounds indicates a more serious error. It may have caused the memory within the WASM context to be so corrupted that you cannot continue using it without further errors. I'll have a look at tracking this down next week.
You could try a "memento" or ASAN build of the mupdf wasm, to see if you can track down where the corruption occurs.
PS. You can feed the raw RGB pixel data from the pixmap directly into a canvas Image, without round-tripping through PNG.
let pixmap = page.toPixmap(mupdf.Matrix.identity, mupdf.ColorSpace.DeviceRGB, true)
let imageData = new ImageData(pixmap.getPixels().slice(), pixmap.getWidth(), pixmap.getHeight())
Hey @ccxvii, are there perhaps any updates on this? I seem to be getting this on both the core & /mupdfjs levels while trying to create a trace device as per https://mupdf.readthedocs.io/en/latest/cookbook/javascript/advanced.html#trace-device.