docling
docling copied to clipboard
Create a backend to transform USPTO patents (XML and TXT) to DoclingDocument
Requested feature
- The Docling library defines a
DeclarativeDocumentBackendabstract class to transform different document formats toDoclingDocumentwithout a recognition pipeline. Implementations includeHTMLDocumentBackendfor HTML pages andMsWordDocumentBackendfor MS Word documents. - The United States Patent and Trademark Office (USPTO) is the federal agency for granting U.S. patents and registering trademarks. The USPTO disseminate public patent and trademark pre-packaged or user-customized bulk data products through the [Bulk Data Storage System.
- Patent applications and grants are available in several formats. In particular, full text data (no images) are available in XML format and packaged in zip files. Some old grants though are in tabular format (grants from January 1976 till December 2001).
This feature consists of providing a document backend implementation that parses USPTO patent and application content (text) into a docling document.
Alternatives
There are no alternatives at this point, since this is a new feature.