trafilatura icon indicating copy to clipboard operation
trafilatura copied to clipboard

Is it possible to get the metadata with markdown format?

Open charleshan opened this issue 2 years ago • 1 comments

There are useful information when we output to json format, such as: title, author, and date. However, it looks like json only has raw_text as the content format.

The workaround is extracting in both json and txt with include_formatting but I think we can do better

charleshan avatar Jul 08 '23 07:07 charleshan

Good point, the code here could definitely be improved to add further metadata:

https://github.com/adbar/trafilatura/blob/123414cae5f927e743f5eced2cd43b81a65fc43c/trafilatura/xml.py#L41

adbar avatar Jul 10 '23 10:07 adbar