I hope to find a way to remove headers and footers
I used the marker project and felt that it was very good. I don't know if it was a problem with my use or if I didn't pay attention to some details. I hope to find a way to filter out PDFs without footers, because the content in those areas is generally some irrelevant badges or some common languages. I don't know if a parameter can be added to reduce the interference of these useless information on the results of file conversion.
Thank you.
Can you please share an example PDF?
Thank you very much for your reply. I will give you a sample file. This file is a PDF file that can be searched publicly in China and does not involve confidentiality issues. You will find that the header of the first page will have a logo and the address of the organization that wrote this file. From the second page, there will be some small headers with logos. Some files will also have some footers, mainly some information such as the organization introduction and disclaimer.
I hope to add a parameter to skip this information, because I see that Surya can analyze the layout and also give clear footer and header positioning areas. Can it be used as an exclusion item and not perform corresponding identification and operations?
Thank you
Is there any progress? I have the same requirement.
Has there been any progress? I also need this feature
Same here. Any updates?