WikiPlots icon indicating copy to clipboard operation
WikiPlots copied to clipboard

'plot' in current.get_text().lower() doesn't match all relevant headers

Open MarcinCiura opened this issue 8 years ago • 2 comments

In a similar project of mine, I used this regexp:

PLOT = [
    'Plot summary', 'Plot', 'Plot introduction',
    'Synopsis', 'Summary', 'Plot synopsis',
    'Overview', 'Story', 'Description' , 'Contents?'
]
HEADING_RE = re.compile(
    r'^ *=+\s*(%s)\s*=+' % '|'.join(PLOT),
    re.IGNORECASE | re.UNICODE | re.MULTILINE)

MarcinCiura avatar Apr 28 '17 16:04 MarcinCiura

Thanks for the suggestion. This might pick up some things that aren't novels, movies, or video games though. I try it out and see.

markriedl avatar Apr 28 '17 23:04 markriedl

Right. That's why I used Wikipedia categories, which may be too big a change for your script. FWIW, here's the breakdown of headers in articles about novels: Plot summary 8466 Plot 5664 Plot introduction 1696 Synopsis 1492 Summary 636 Plot Summary 314 Plot synopsis 213 Overview 212 Story 124 Description 97 Contents 67 Content 53

MarcinCiura avatar Apr 29 '17 10:04 MarcinCiura