spec
spec copied to clipboard
ocrd_tool: allow object for path_in_archive of resources
During debugging bertsky/ocrd_detectron2#14 I realized that my assumption that every archive would only contain a single resource was wrong. The detectron2 models consist of a pytorch NN and a YAML description. This requires redundancy in the description and requires downloading the same archive twice.
With this change (and corresponding implementation in core), it would be possible to simplify
- description: DocBank via LayoutLM X101-FPN config
name: DocBank_X101.yaml
type: archive
path_in_archive: X101/X101.yaml
size: 526
url: https://layoutlm.blob.core.windows.net/docbank/model_zoo/X101.zip
- description: DocBank via LayoutLM X101-FPN config
name: DocBank_X101.pth
type: archive
path_in_archive: X101/model.pth
size: 835606605
url: https://layoutlm.blob.core.windows.net/docbank/model_zoo/X101.zip
to
- description: DocBank via LayoutLM X101-FPN config
name: DocBank_X101.pth
type: archive
path_in_archive:
DocBank_X101.pth: X101/model.pth
DocBank_X101.yaml: X101/X101.yaml
size: 783884362
url: https://layoutlm.blob.core.windows.net/docbank/model_zoo/X101.zip
Also, this way the progressbar would be working again because the size attribute would always refer to the archive, not the file/folder in the archive.