pdfbox icon indicating copy to clipboard operation
pdfbox copied to clipboard

potential memory leaks and small performance improvements

Open valerybokov opened this issue 3 years ago • 345 comments

valerybokov avatar Mar 01 '21 13:03 valerybokov

tnx for fast respond. I'll try to fix it.

valerybokov avatar Mar 01 '21 17:03 valerybokov

Thanks, I committed everything except the memory leak. The problem is that I don't know for sure if the object is still used. The tests pass, but we don't really have much tests in FDF / XFDF. And the parser was modified to be "on demand" in the trunk, which means that input may still be accessed after parsing (however I don't know if this applies to FDF / XFDF, or only for PDF).

THausherr avatar Mar 01 '21 19:03 THausherr

I understood. Best regards!

valerybokov avatar Mar 01 '21 20:03 valerybokov

@lehmi any opinion on this, i.e. does the "on demand" parsing logic also apply to FDF / XFDF, so that the input should not be closed until done?

THausherr avatar Mar 02 '21 04:03 THausherr

The FDF parser uses the same on demand logic than the PDFParser so that we must not close the input file. The XFDF-stuff doesn't use the PDFBox parser at all. It uses DocumentBuilder and it is save to close the input after parsing it.

lehmi avatar Mar 02 '21 06:03 lehmi

@lehmi Thank you.

@valerybokov I committed the FDF part, but used the interface type on the left side of the assignment, see here why (although not really important here)

THausherr avatar Mar 02 '21 17:03 THausherr

THausherr, no problems.

valerybokov avatar Mar 02 '21 18:03 valerybokov

Thanks, I think we're done here. If yes, then please close the PR.

THausherr avatar Mar 02 '21 19:03 THausherr

Hi! I've made a few changes to fontbox. I hope they will be useful to you.

valerybokov avatar Mar 04 '21 11:03 valerybokov

Thanks, this looks promising. However sb.setLength(0); won't work because it's the same object that is assigned to radix.

THausherr avatar Mar 04 '21 17:03 THausherr

My bad. I will revoke it.

valerybokov avatar Mar 04 '21 17:03 valerybokov

May I ask you why the CFFTable class has a tag with a space? Code:

public static final string TAG = "CFF ";

valerybokov avatar Mar 04 '21 19:03 valerybokov

I suspect these tags must be 4 bytes long. CFFParser.readTagName() reads 4 bytes.

THausherr avatar Mar 04 '21 19:03 THausherr

ok

valerybokov avatar Mar 05 '21 06:03 valerybokov

Thanks, I committed the last one, but differently.

THausherr avatar Mar 05 '21 16:03 THausherr

I'm done for now. Thanks again. More later or tomorrow.

THausherr avatar Mar 05 '21 17:03 THausherr

Thanks. Goodbye

valerybokov avatar Mar 05 '21 17:03 valerybokov

The one in TrueTypeCollection.java I don't agree. It looks kind of weird that two parser objects are generated but only one is used.

THausherr avatar Mar 06 '21 17:03 THausherr

The one about the double-checked-locking I'll do after the next release. This is a very sensitive topic.

THausherr avatar Mar 06 '21 17:03 THausherr

@THausherr: "The one in TrueTypeCollection.java I don't agree. It looks kind of weird that two parser objects are generated but only one is used." Me: Actually, it depends of algorithm. If we need the same parser every time, then we can add one parameter instead of two to the getFontAtIndex method. I don't know this algorithm well, so if you think we only need one parser for each method (getFontByName and processAllFonts), that's ok. The goal is to reduce the number of memory allocations because it is slow. Therefore it looks strange.

valerybokov avatar Mar 06 '21 18:03 valerybokov

@THausherr : The one about the double-checked-locking I'll do after the next release. This is a very sensitive topic. Me: this is a known pattern to reduce the use of synchronized context (since synchronization is slow). This is useful for methods that can be called many times. And all changed methods are public, so the programmer can call them multiple times.

valerybokov avatar Mar 06 '21 19:03 valerybokov

For a FileSystemFontProvider.readTrueTypeFont. If will be exception from ttc.getFontByName then ttc will not be closed. Same for getOTFFont.

valerybokov avatar Mar 08 '21 07:03 valerybokov

Thanks, I did several of them. I created two issues for the DeviceN stuff (PDFBOX-5121 and PDFBOX-5122) because of the amazing speed improvement.

Please merge again, so that I can see what I'll commit before the release.

THausherr avatar Mar 08 '21 12:03 THausherr

COSDictionary.getObjectFromPath method. If the first object found is sufficient, you need to add two break operators to the loop.

valerybokov avatar Mar 08 '21 14:03 valerybokov

COSDictionary.getDictionaryString (row 1526). It looks like the toList method is unnecessary.

valerybokov avatar Mar 08 '21 15:03 valerybokov

The change in getDiskCacheFile() I don't want to do, inline assignments are considered messy. (Please don't fix any that we may have 😂)

THausherr avatar Mar 08 '21 15:03 THausherr

I understood. I thought about triple allocation of memory for the File object. You can allocate memory before the if statement and when need to change file.

valerybokov avatar Mar 08 '21 15:03 valerybokov

Looks like a method ComplexPropertyContainer.isSameProperty need to move to AbstractField.equals. Maybe, move it partically. And it finally compares by references, so many previous operations inside this method are useless (imho).

valerybokov avatar Mar 09 '21 10:03 valerybokov

You are not forgotten, but I've been busy. Might take a few days.

THausherr avatar Mar 10 '21 20:03 THausherr

Thanks. I've been busy too

valerybokov avatar Mar 11 '21 16:03 valerybokov