unipdf icon indicating copy to clipboard operation
unipdf copied to clipboard

[IMPROVEMENT] Optimize parsing functions in contentstream package

Open gunnsth opened this issue 4 years ago • 2 comments

Is your feature request related to a problem? Please describe. contentstream parsing can be pretty slow and profiling has revealed a lot of time spent in the basic parsing functions (parseNumber particularly).

Describe the solution you'd like Parsing functions in the core package have been optimized somewhat, such as parseNumber. The task involves profiling and optimizing those primitive parsing functions. Would make sense to compare to the core package implementations which are similar but have been optimized more.

Describe alternatives you've considered N/A

Additional context N/A

gunnsth avatar Oct 21 '19 00:10 gunnsth

I've run into a large (70MB) PDF that is taking about 90 seconds to parse and rewrite. I noticed a large portion of time is spent in contentstream parseNumber. By replacing the code for that with the one from core I can get it to 70 seconds which is quite a significant change. The one difference however was wrapping the tracing in if common.Log.IsLogLevel(common.LogLevelTrace) { to avoid the allocation of the string on each character but also needed to make sure that DummyLogger.IsLogLevel always returns false rather than true (not sure how best to handle that but since it doesn't actually log anything might else well return false?). That of course can be separate since it's likely the user of the package replaced the logger anyway.

I would love to see this change make it upstream and can contribute a patch if you wish.

samuel avatar Dec 17 '19 00:12 samuel

@samuel We would definitely be happy to accept a PR for that. Sounds like a great enhancement.

gunnsth avatar Dec 17 '19 13:12 gunnsth