pdfbox
pdfbox copied to clipboard
potential memory leaks and small performance improvements
tnx for fast respond. I'll try to fix it.
Thanks, I committed everything except the memory leak. The problem is that I don't know for sure if the object is still used. The tests pass, but we don't really have much tests in FDF / XFDF. And the parser was modified to be "on demand" in the trunk, which means that input may still be accessed after parsing (however I don't know if this applies to FDF / XFDF, or only for PDF).
I understood. Best regards!
@lehmi any opinion on this, i.e. does the "on demand" parsing logic also apply to FDF / XFDF, so that the input should not be closed until done?
The FDF parser uses the same on demand logic than the PDFParser so that we must not close the input file. The XFDF-stuff doesn't use the PDFBox parser at all. It uses DocumentBuilder and it is save to close the input after parsing it.
@lehmi Thank you.
@valerybokov I committed the FDF part, but used the interface type on the left side of the assignment, see here why (although not really important here)
THausherr, no problems.
Thanks, I think we're done here. If yes, then please close the PR.
Hi! I've made a few changes to fontbox. I hope they will be useful to you.
Thanks, this looks promising. However sb.setLength(0);
won't work because it's the same object that is assigned to radix
.
My bad. I will revoke it.
May I ask you why the CFFTable class has a tag with a space? Code:
public static final string TAG = "CFF ";
I suspect these tags must be 4 bytes long. CFFParser.readTagName() reads 4 bytes.
ok
Thanks, I committed the last one, but differently.
I'm done for now. Thanks again. More later or tomorrow.
Thanks. Goodbye
The one in TrueTypeCollection.java I don't agree. It looks kind of weird that two parser objects are generated but only one is used.
The one about the double-checked-locking I'll do after the next release. This is a very sensitive topic.
@THausherr: "The one in TrueTypeCollection.java I don't agree. It looks kind of weird that two parser objects are generated but only one is used." Me: Actually, it depends of algorithm. If we need the same parser every time, then we can add one parameter instead of two to the getFontAtIndex method. I don't know this algorithm well, so if you think we only need one parser for each method (getFontByName and processAllFonts), that's ok. The goal is to reduce the number of memory allocations because it is slow. Therefore it looks strange.
@THausherr : The one about the double-checked-locking I'll do after the next release. This is a very sensitive topic. Me: this is a known pattern to reduce the use of synchronized context (since synchronization is slow). This is useful for methods that can be called many times. And all changed methods are public, so the programmer can call them multiple times.
For a FileSystemFontProvider.readTrueTypeFont. If will be exception from ttc.getFontByName then ttc will not be closed. Same for getOTFFont.
Thanks, I did several of them. I created two issues for the DeviceN stuff (PDFBOX-5121 and PDFBOX-5122) because of the amazing speed improvement.
Please merge again, so that I can see what I'll commit before the release.
COSDictionary.getObjectFromPath method. If the first object found is sufficient, you need to add two break operators to the loop.
COSDictionary.getDictionaryString (row 1526). It looks like the toList method is unnecessary.
The change in getDiskCacheFile() I don't want to do, inline assignments are considered messy. (Please don't fix any that we may have 😂)
I understood. I thought about triple allocation of memory for the File object. You can allocate memory before the if statement and when need to change file.
Looks like a method ComplexPropertyContainer.isSameProperty need to move to AbstractField.equals. Maybe, move it partically. And it finally compares by references, so many previous operations inside this method are useless (imho).
You are not forgotten, but I've been busy. Might take a few days.
Thanks. I've been busy too