unipdf
unipdf copied to clipboard
[IMPROVEMENT] Optimize parsing functions in contentstream package
Is your feature request related to a problem? Please describe. contentstream parsing can be pretty slow and profiling has revealed a lot of time spent in the basic parsing functions (parseNumber particularly).
Describe the solution you'd like
Parsing functions in the core
package have been optimized somewhat, such as parseNumber
. The task involves profiling and optimizing those primitive parsing functions. Would make sense to compare to the core package implementations which are similar but have been optimized more.
Describe alternatives you've considered N/A
Additional context N/A
I've run into a large (70MB) PDF that is taking about 90 seconds to parse and rewrite. I noticed a large portion of time is spent in contentstream parseNumber. By replacing the code for that with the one from core I can get it to 70 seconds which is quite a significant change. The one difference however was wrapping the tracing in if common.Log.IsLogLevel(common.LogLevelTrace) {
to avoid the allocation of the string on each character but also needed to make sure that DummyLogger.IsLogLevel always returns false rather than true (not sure how best to handle that but since it doesn't actually log anything might else well return false?). That of course can be separate since it's likely the user of the package replaced the logger anyway.
I would love to see this change make it upstream and can contribute a patch if you wish.
@samuel We would definitely be happy to accept a PR for that. Sounds like a great enhancement.