ref(processor): Use symbolic-sourcemapcache for JavaScript Sourcemap processing
This PR attempts to replace the currently used rust-sourcemap crate and it's symbolic python bindings, with symbolic-sourcemapcache crate.
It makes the whole processing pipeline easier to maintain, as it pushes some work directly to Symbolic, as well as we get better function names due to better scope resolution and in some cases better file URLs.
Other than that, we don't use SourceView anymore, as it seemed like an unnecessary layer of abstraction for something that is used only for context_lines extraction. We cache utf-8 decoded sources directly now, as this way we can encode them only once for SmCache instance initialization, and use the source directly otherwise for context lines extraction.
Some tests had to updated to express current behavior.
The notable thing is useless_fn_names = ["<anonymous>", "__webpack_require__", "__webpack_modules__"], which is mostly for production mode of webpack, that by default trims all the function names, and we decided to fallback to the minified names in those cases instead (this was already the old behavior).
It should be possible to extract something better, but we'd need to parse all sourceContents from sourcemap to do that, as the only thing we can get better function name for the case mentioned above, is if we look at the right-hand side of default node export, in form of module.exports = function foo () {}. This should give us foo, yet the only thing we can extract is module.exports, as minified form of this expression in webpack production mode is module.exports = function () {}.
NOTE: This PR requires https://github.com/getsentry/symbolic/tree/feat/sourcemapcache to be merged and deployed first.
For local testing checkout feat/sourcemapcache and build everything with make build (it will both, compile rust code and create python bindings).
Then in getsentry/sentry repo install local symbolic instance by updating requirements- files to the version that your local repo is and run SYMBOLIC_DEBUG=1 pip install --editable ../symbolic/py -I --no-warn-conflicts.
The the SourceView discussion:
Keeping it around would avoid the decode/encode roundtrips, as the SourceView is I think bytes internally, so we can use that to create the sourcemap cache without an additional encode step. It also avoids repeated str.split as mentioned above.
I’m a bit unsure how the code handles the case when:
- we have a sourcemap
- the sourcemap does not have embedded
sourceContent - we can access the source files via the release artifacts
Is this even a thing? How is this handled right now?
And another thing I remember from our pairing sessions: What exactly is the purpose of the pre/process_frame split? Why do we pre-populate the cache in advance, and not just do it on-demand in process_frame?
I still need to review the code but some inputs I can already share based on the questions by @Swatinem about the sourcesContent missing. Today this is less common because sentry-cli will rewrite source maps before bundling and embed them, but historically the more common situation was that sources were not included and required separate lookups from the artifacts. We also currently permit source maps to reference files that then are fetched from the internet.