core icon indicating copy to clipboard operation
core copied to clipboard

Clone filter

Open kba opened this issue 5 years ago • 3 comments
trafficstars

  • New class OcrdMetsFilter in ocrd_models that represents restrictions on files (include/exclude by fileGrp, mimetype currently)
  • ocrd workspace clone supports
    • --fileGrp-include
    • --fileGrp-exclude
    • --mimetype-include
    • --mimetype-exclude

Proposed by @bertsky in #506

This is a very rushed implementation because we need this feature now., Implementation has been improved now.

kba avatar Aug 28 '20 14:08 kba

Codecov Report

Merging #582 into master will increase coverage by 0.66%. The diff coverage is 99.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #582      +/-   ##
==========================================
+ Coverage   84.60%   85.27%   +0.66%     
==========================================
  Files          49       50       +1     
  Lines        2813     2933     +120     
  Branches      550      577      +27     
==========================================
+ Hits         2380     2501     +121     
  Misses        332      332              
+ Partials      101      100       -1     
Impacted Files Coverage Δ
ocrd_utils/ocrd_utils/__init__.py 100.00% <ø> (ø)
ocrd_models/ocrd_models/ocrd_mets_filter.py 97.70% <97.70%> (ø)
ocrd/ocrd/cli/workspace.py 76.13% <100.00%> (-0.34%) :arrow_down:
ocrd/ocrd/decorators.py 95.78% <100.00%> (+4.12%) :arrow_up:
ocrd/ocrd/resolver.py 96.66% <100.00%> (+0.11%) :arrow_up:
ocrd_models/ocrd_models/__init__.py 100.00% <100.00%> (ø)
ocrd_models/ocrd_models/ocrd_mets.py 93.14% <100.00%> (+<0.01%) :arrow_up:
ocrd_models/ocrd_models/ocrd_xml_base.py 93.33% <100.00%> (+2.02%) :arrow_up:
ocrd_utils/ocrd_utils/str.py 90.81% <100.00%> (+1.28%) :arrow_up:
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8dafbac...9fb27c0. Read the comment docs.

codecov-commenter avatar Aug 29 '20 12:08 codecov-commenter

Any preferences on the command line interface?

0

  --ID, --id PAT                  ID to include, string/regex/comma-separated
  --not-ID, --not-id PAT          ID to exclude, string/regex/comma-separated
  --mimetype PAT                  mimetype to include, string/regex/comma-separated
  --not-mimetype PAT              mimetype to exclude, string/regex/comma-separated
  --pageId, --pageid PAT          pageId to include, string/comma-separated
  --not-pageId, --not-pageid PAT  pageId to exclude, string/regex/comma-separated
  --fileGrp, --filegrp PAT        fileGrp to include, string/regex/comma-separated
  --not-fileGrp, --not-filegrp PAT
                                  fileGrp to exclude, string/regex/comma-separated

1

  --id PAT            ID to include, string/regex/comma-separated
  --not-id PAT        ID to exclude, string/regex/comma-separated
  --mimetype PAT      mimetype to include, string/regex/comma-separated
  --not-mimetype PAT  mimetype to exclude, string/regex/comma-separated
  --pageid PAT        pageId to include, string/comma-separated
  --not-pageid PAT    pageId to exclude, string/regex/comma-separated
  --filegrp PAT       fileGrp to include, string/regex/comma-separated
  --not-filegrp PAT   fileGrp to exclude, string/regex/comma-separated

2

  --id-include PAT        ID to include, string/regex/comma-separated
  --id-exclude PAT        ID to exclude, string/regex/comma-separated
  --mimetype-include PAT  mimetype to include, string/regex/comma-separated
  --mimetype-exclude PAT  mimetype to exclude, string/regex/comma-separated
  --pageid-include PAT    pageId to include, string/comma-separated
  --pageid-exclude PAT    pageId to exclude, string/regex/comma-separated
  --filegrp-include PAT   fileGrp to include, string/regex/comma-separated
  --filegrp-exclude PAT   fileGrp to exclude, string/regex/comma-separated

3

  --id PAT            ID to include, string/regex/comma-separated
  --not-ID PAT        ID to exclude, string/regex/comma-separated
  --mimetype PAT      mimetype to include, string/regex/comma-separated
  --not-mimetype PAT  mimetype to exclude, string/regex/comma-separated
  --pageid PAT        pageId to include, string/comma-separated
  --not-pageId PAT    pageId to exclude, string/regex/comma-separated
  --filegrp PAT       fileGrp to include, string/regex/comma-separated
  --not-fileGrp PAT   fileGrp to exclude, string/regex/comma-separated

kba avatar Aug 29 '20 14:08 kba

Any preferences on the command line interface?

I fail to see the difference between 1 and 3. But I would prefer the --not-* scheme over *-exclude/*-include.

What about --not (as a separate option negating the follow-up option), though?

Also, I think it would be better to use the same identifiers as the other workspace CLI commands:

  • -i | --file-id
  • -m | --mimetype
  • -g | --page-id
  • -G | --file-grp

bertsky avatar Aug 31 '20 09:08 bertsky

The relevant part is filtering by file group, which has now been impemented in #1139 in a simpler way than the more generic way proposed here.

Since this is only targeting file groups and --not-file-grp/--file-grp would conflict with the regular --file-grp option, it is using -Q/--exclude-file-grps and -q/--include-file-grps.

kba avatar Nov 23 '23 12:11 kba