wagtail-markdown icon indicating copy to clipboard operation
wagtail-markdown copied to clipboard

Make MarkdownFields translatable

Open jeriox opened this issue 2 years ago • 4 comments

Currently, when using wagtail-localize, a MarkdownField cannot be translated in an easy way, as the whole content of the field is put into one translation segment. For a long page with a markdown body, this is not feasible. I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately. I wrote a hacky solution for that some time ago, but it breaks with the current version. I'd be happy if we could find a way to support that properly.

My old code for reference:

import html2text
from django.db.models import TextField
from wagtail_localize.segments import (
    OverridableSegmentValue,
    StringSegmentValue,
    TemplateSegmentValue,
)
from wagtail_localize.segments.extract import quote_path_component
from wagtail_localize.segments.ingest import organise_template_segments
from wagtail_localize.strings import extract_strings, restore_strings

from wagtailmarkdown.utils import render_markdown
from wagtailmarkdown.widgets import MarkdownTextarea


class MarkdownField(TextField):
    def formfield(self, **kwargs):
        defaults = {"widget": MarkdownTextarea}
        defaults.update(kwargs)
        return super(MarkdownField, self).formfield(**defaults)

    def get_translatable_segments(self, value):
        template, strings = extract_strings(render_markdown(value))

        # Find all unique href values
        hrefs = set()
        for string, attrs in strings:
            for tag_attrs in attrs.values():
                if "href" in tag_attrs:
                    hrefs.add(tag_attrs["href"])

        return (
            [TemplateSegmentValue("", "html", template, len(strings))]
            + [StringSegmentValue("", string, attrs=attrs) for string, attrs in strings]
            + [OverridableSegmentValue(quote_path_component(href), href) for href in sorted(hrefs)]
        )

    def restore_translated_segments(self, value, field_segments):
        format, template, strings = organise_template_segments(field_segments)
        return html2text.html2text(restore_strings(template, strings))

jeriox avatar Jul 05 '22 21:07 jeriox

Hey @jeriox,

thank you for sharing this. Had a few requests for making this localize-compatible, so the code snippet is very handy!

zerolab avatar Jul 06 '22 08:07 zerolab

I got it working again with the code above, we will use that for now. Still feels a bit hacky to me, so we'd be happy if there was a better alternative built in :)

jeriox avatar Aug 29 '22 12:08 jeriox

This would need a bit of thinking. e.g.

I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.

Where do you draw the line and split things? is it at every link? every paragraph? every heading? given we can allow raw html in there too, how should we handle that?

zerolab avatar Aug 29 '22 13:08 zerolab

This would need a bit of thinking. e.g.

I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.

Where do you draw the line and split things? is it at every link? every paragraph? every heading? given we can allow raw html in there too, how should we handle that?

Currently, my approach works as follows: as there is already a lot of thought going into how to split up StreamFields, I tried to reuse that as much as possible. Therefor, I render the markdown to HTML and use the existings extract_strings() method. This also ensures that links are treated appropriatly. For the other direction, using html2text works quite well. I didn't test with raw HTML though. I think that every paragraph and every heading is a good split, as it ensures that one doesn't need to re-translate it if the page didn't change.

jeriox avatar Aug 29 '22 13:08 jeriox