python-markdownify
python-markdownify copied to clipboard
Handle relative image URLs
Add ability to process non-full image url, such as 'path/to/img.png' or '/path/to/img.png'
Hey, thanks for your contribution! Any reason why the base_url gets cut into host and protocol, instead of using it as-is as prefix? Maybe the user wants to prefix their URLs with a full locator.
@kaichen - could you provide an example use case for this feature? I don't fully understand it from the pull request description.
could you provide an example use case for this feature? I don't fully understand it from the pull request description.
Some webpages might use relative paths for their image URLs. When using this library to download HTML and convert it to Markdown, need the full image URLs to ensure the images render correctly.
Hey, thanks for your contribution! Any reason why the base_url gets cut into host and protocol, instead of using it as-is as prefix? Maybe the user wants to prefix their URLs with a full locator.
just want to make sure base_url join relative correctly.
I have mixed feelings about this.
On one hand, I always appreciate a pull request contribution. And on the surface, this provides a nice convenience for this use case.
But on the other hand, Markdownify's job is to render the provided HTML to Markdown, and as the Unix mantra says, "do one thing and do it well." Modifying link content is content modification, not content rendering, which feels more like source preprocessing before Markdownify is called.
Two more random thoughts:
-
<a>links should be given similar consideration. -
Another approach is to use a link-formatting function in
process_img()andprocess_a():def format_link(link_text): return link_text; # default is to use link text as-isthen allow the user to override this, either by an option that takes a callback function, or by a subclassed function override.
Or maybe I am overthinking it, and this is simply a nice convenience that we should implement. :)
@AlexVonB, what are your thoughts on this?