Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

attrs for paragraphs - no longer uses ID

Open jmp111 opened this issue 2 years ago • 0 comments

The config_pmc.json file will fail for new HTML files from PMC due to the fact that the id tag is gone from the paragraph tag.

I suggest to change it to (where the first part includes Valentina's fix for some articles, and the second part is new to work on new PMC files):

"paragraphs": {
  "data": {},
  "defined-by": [
    {
      "tag": "p",
      "attrs": {"id": "_*[pP\\-|pP|Par]*\\d+"}
    },
	{
      "tag": "p",
      "attrs": {"class": "p"}
    }
  ]
},

jmp111 avatar Mar 07 '23 16:03 jmp111