wikiextractor icon indicating copy to clipboard operation
wikiextractor copied to clipboard

Option to keep == Section == syntax around titles

Open sooheon opened this issue 7 years ago • 1 comments

It would be nice to have this option, knowing that a particular bit of text is a section title, and what level the section is, is useful for some downstream analysis tasks.

Currently:

Foo bar.

Foo bar is blah blah blah....

Desired

= Foo bar =

Foo bar is blah blah blah....

sooheon avatar Jul 10 '18 02:07 sooheon

In V3.0.6, this can be solved by changing the default argument mark_headers=False to mark_headers=True at extract.Extractor.clearn_text. Then headings start with #, e.g. "## Section 1".

richardwth avatar Mar 17 '22 08:03 richardwth