python-markdownify
python-markdownify copied to clipboard
Modernize Code
This PR modernizes the codebase.
I published it as a PyPi package: https://pypi.org/project/html-to-markdown/, but it would be better to contribute it upstream.
Please check out the changes - the changes to the license file and readme are revertable, of course, without any issue.
There is though a breaking change here: I removed the converter class and moved everything to be functional. if this is undesirable, this PR should be closed.
With your version, how can you customize the conversion of a particular element?
With your version, how can you customize the conversion of a particular element?
i would allow users to override existing converters or register new converters by the element name. Since its typed this can be done in a pretty semantic fashion. You can also add callbacks or hooks (e.g. pre processing hook, post processing hook etc.).
You are welcome to merge my PR into a branch, and then rework it as you see fit of course. I am glad to contribute.
Also, we could consider doing something like this if you are interested (this is just an illustration):
from warnings import simplefilter, warn
def create_legacy_class(
autolinks: bool,
bullets: str,
code_language: str,
code_language_callback: Callable[[Tag], str] | None,
default_title: bool,
heading_style: Literal["atx", "atx_closed", "underlined"],
keep_inline_images_in: Iterable[str] | None,
newline_style: str,
strong_em_symbol: str,
sub_symbol: str,
sup_symbol: str,
wrap: bool,
wrap_width: int,
) -> type:
"""Create a legacy class for Markdownify.
Deprecated: Use the new hooks api instead.
Note: This is a temporary function to help with the transition to the new api.
Args:
autolinks: Whether to convert URLs into links.
bullets: The bullet characters to use for unordered lists.
code_language: The default code language to use.
code_language_callback: A callback to get the code language.
default_title: Whether to use the URL as the title for links.
heading_style: The style of headings.
keep_inline_images_in: The tags to keep inline images in.
newline_style: The style of newlines.
strong_em_symbol: The symbol to use for strong and emphasis text.
sub_symbol: The symbol to use for subscript text.
sup_symbol: The symbol to use for superscript text.
wrap: Whether to wrap text.
wrap_width: The width to wrap text at.
Returns:
A class that can be used to convert HTML to Markdown.
"""
simplefilter("always", DeprecationWarning)
warn(
"The Markdownify class is deprecated and will be removed in the next major version (version 2.0). Use the new api instead.",
category=DeprecationWarning,
stacklevel=2,
)
simplefilter("default", DeprecationWarning)
return type(
"Markdownify",
(),
{
k.removeprefix("_"): v
for k, v in create_converters_map(
autolinks=autolinks,
bullets=bullets,
code_language=code_language,
code_language_callback=code_language_callback,
default_title=default_title,
heading_style=heading_style,
keep_inline_images_in=keep_inline_images_in,
newline_style=newline_style,
strong_em_symbol=strong_em_symbol,
sub_symbol=sub_symbol,
sup_symbol=sup_symbol,
wrap=wrap,
wrap_width=wrap_width,
).items()
},
)