pdfbox potential memory leaks and small performance improvements

Mar 01 '21 13:03 valerybokov

tnx for fast respond. I'll try to fix it.

Mar 01 '21 17:03 valerybokov

Thanks, I committed everything except the memory leak. The problem is that I don't know for sure if the object is still used. The tests pass, but we don't really have much tests in FDF / XFDF. And the parser was modified to be "on demand" in the trunk, which means that input may still be accessed after parsing (however I don't know if this applies to FDF / XFDF, or only for PDF).

Mar 01 '21 19:03 THausherr

I understood. Best regards!

Mar 01 '21 20:03 valerybokov

@lehmi any opinion on this, i.e. does the "on demand" parsing logic also apply to FDF / XFDF, so that the input should not be closed until done?

Mar 02 '21 04:03 THausherr

The FDF parser uses the same on demand logic than the PDFParser so that we must not close the input file. The XFDF-stuff doesn't use the PDFBox parser at all. It uses DocumentBuilder and it is save to close the input after parsing it.

Mar 02 '21 06:03 lehmi

@lehmi Thank you.

@valerybokov I committed the FDF part, but used the interface type on the left side of the assignment, see here why (although not really important here)

Mar 02 '21 17:03 THausherr

THausherr, no problems.

Mar 02 '21 18:03 valerybokov

Thanks, I think we're done here. If yes, then please close the PR.

Mar 02 '21 19:03 THausherr

Hi! I've made a few changes to fontbox. I hope they will be useful to you.

Mar 04 '21 11:03 valerybokov

Thanks, this looks promising. However sb.setLength(0); won't work because it's the same object that is assigned to radix.

Mar 04 '21 17:03 THausherr

My bad. I will revoke it.

Mar 04 '21 17:03 valerybokov

May I ask you why the CFFTable class has a tag with a space? Code:

public static final string TAG = "CFF ";

Mar 04 '21 19:03 valerybokov

I suspect these tags must be 4 bytes long. CFFParser.readTagName() reads 4 bytes.

Mar 04 '21 19:03 THausherr

ok

Mar 05 '21 06:03 valerybokov

Thanks, I committed the last one, but differently.

Mar 05 '21 16:03 THausherr

I'm done for now. Thanks again. More later or tomorrow.

Mar 05 '21 17:03 THausherr

Thanks. Goodbye

Mar 05 '21 17:03 valerybokov

The one in TrueTypeCollection.java I don't agree. It looks kind of weird that two parser objects are generated but only one is used.

Mar 06 '21 17:03 THausherr

The one about the double-checked-locking I'll do after the next release. This is a very sensitive topic.

Mar 06 '21 17:03 THausherr

@THausherr: "The one in TrueTypeCollection.java I don't agree. It looks kind of weird that two parser objects are generated but only one is used." Me: Actually, it depends of algorithm. If we need the same parser every time, then we can add one parameter instead of two to the getFontAtIndex method. I don't know this algorithm well, so if you think we only need one parser for each method (getFontByName and processAllFonts), that's ok. The goal is to reduce the number of memory allocations because it is slow. Therefore it looks strange.

Mar 06 '21 18:03 valerybokov

@THausherr : The one about the double-checked-locking I'll do after the next release. This is a very sensitive topic. Me: this is a known pattern to reduce the use of synchronized context (since synchronization is slow). This is useful for methods that can be called many times. And all changed methods are public, so the programmer can call them multiple times.

Mar 06 '21 19:03 valerybokov

For a FileSystemFontProvider.readTrueTypeFont. If will be exception from ttc.getFontByName then ttc will not be closed. Same for getOTFFont.

Mar 08 '21 07:03 valerybokov

Thanks, I did several of them. I created two issues for the DeviceN stuff (PDFBOX-5121 and PDFBOX-5122) because of the amazing speed improvement.

Please merge again, so that I can see what I'll commit before the release.

Mar 08 '21 12:03 THausherr

COSDictionary.getObjectFromPath method. If the first object found is sufficient, you need to add two break operators to the loop.

Mar 08 '21 14:03 valerybokov

COSDictionary.getDictionaryString (row 1526). It looks like the toList method is unnecessary.

Mar 08 '21 15:03 valerybokov

The change in getDiskCacheFile() I don't want to do, inline assignments are considered messy. (Please don't fix any that we may have 😂)

Mar 08 '21 15:03 THausherr

I understood. I thought about triple allocation of memory for the File object. You can allocate memory before the if statement and when need to change file.

Mar 08 '21 15:03 valerybokov

Looks like a method ComplexPropertyContainer.isSameProperty need to move to AbstractField.equals. Maybe, move it partically. And it finally compares by references, so many previous operations inside this method are useless (imho).

Mar 09 '21 10:03 valerybokov

You are not forgotten, but I've been busy. Might take a few days.

Mar 10 '21 20:03 THausherr

Thanks. I've been busy too

Mar 11 '21 16:03 valerybokov

pdfbox pdfbox copied to clipboard

potential memory leaks and small performance improvements

pdfbox
pdfbox copied to clipboard