plone.restapi richttext fields set to `text/x-web-markdown` return HTML

richttext fields set to `text/x-web-markdown` return HTML

Open me-kell opened this issue 1 year ago • 21 comments

The REST API returns the HTML-Version of richttext fields even if the content-type is set to text/x-web-markdown.

I'd expect the stored markdown version (value.raw) to be returned.

We send:

"text": {
  "content-type": "text/x-web-markdown",
  "data": "xyz **bold** normal",
  "encoding": "utf8"
}

We get:

"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

To reproduce it:

SITE_URL=https://6-classic.demo.plone.org
USER=manager:****   # <- set manager password here

cat <<EOF | tee config_markup.json
{
  "allowed_types": [
    {"title":"text/html","token": "text/html"},
    {"title": "text/x-web-markdown","token": "text/x-web-markdown"},
    {"title": "text/x-web-textile","token": "text/x-web-textile"}
  ],
  "default_type": {"title":"text/html","token": "text/html"},
  "markdown_extensions":[
    "markdown.extensions.fenced_code",
    "markdown.extensions.footnotes",
    "markdown.extensions.tables"
  ]
}
EOF

curl -i -X PATCH "${SITE_URL}/@controlpanels/markup" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --data-binary "@my_document.json" \
    --user ${USER}

cat <<EOF | tee my_document.json
{
  "@type": "Document",
  "title": "MyDocument",
  "text": {
    "content-type": "text/x-web-markdown",
    "data": "xyz **bold** normal",
    "encoding": "utf8"
  }
}
EOF

ID=$(
    curl -s -X POST "${SITE_URL}" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --data-binary "@my_document.json" \
    --user ${USER} \
    | jq -r '."@id"' \
)
echo $ID
curl -s -X GET "${ID}" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    --user ${USER} \
    | jq -r '."text"'

It returns

...
"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}
...

Oct 25 '22 09:10 me-kell

IMO the bug is either that the content-type is wrong or the data is wrong, because both do not match.

Having a mode to get raw or rendered is kind of feature request then. For editing one want raw, for display maybe rendered.

Oct 25 '22 09:10 jensens

see also https://community.plone.org/t/markdown-or-mdx-from-plone-rest-api

Oct 25 '22 09:10 jensens

The problem seems to be in the inconsistent handling of RichTextValue in the serializer and deserializer.

plone.app.textfield.value.RichTextValue has the following properties:

raw
encoding
raw_encoded
mimeType
outputMimeType
output
output_relative_to

While the serializer classs plone.restapi.serializer.converters.RichtextDXContextConverter only returns

output_relative_to in key data
mimeType in key content_type
encoding in key encoding

The deserializer class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer sets

raw from key data through html_parser.unescape
mimeType from key content_type
output from self.field.output_mime_type
encoding from key encodding

The properties

raw_encoded and outputMimeType are not handled by the serializing/deserializing process
encoding and mimeType (in the key content-type) are passed transparently by the serializing/deserializing process
The serializer returns output_relative_to in the key data while the deserializer puts the key data parsed with html_parser.unescape into raw and sets outputMimeType to self.field.output_mime_type. See table below

`value`	`serializer`	`deserializer`	`value`
`raw`		`html_parser.unescape(data)`	`raw`
`encoding`	`encoding`	`encoding`	`encoding`
`raw_encoded`			`raw_encoded`
`mimeType`	`content_type`	`content_type`	`mimeType`
`outputMimeType`		`self.field.output_mime_type`	`outputMimeType`
`output`			`output`
`output_relative_to`	`data`		`output_relative_to`

classs plone.restapi.serializer.converters.RichtextDXContextConverter

@adapter(IRichTextValue, IDexterityContent)
@implementer(IContextawareJsonCompatible)
class RichtextDXContextConverter:
    def __init__(self, value, context):
        self.value = value
        self.context = context

    def __call__(self):
        value = self.value
        output = value.output_relative_to(self.context)
        return {
            "data": json_compatible(output),
            "content-type": json_compatible(value.mimeType),
            "encoding": json_compatible(value.encoding),
        }

class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer

@implementer(IFieldDeserializer)
@adapter(IRichText, IDexterityContent, IBrowserRequest)
class RichTextFieldDeserializer(DefaultFieldDeserializer):
    def __call__(self, value):
        content_type = self.field.default_mime_type
        encoding = "utf8"
        if isinstance(value, dict):
            content_type = value.get("content-type", content_type)
            encoding = value.get("encoding", encoding)
            data = value.get("data", "")
        elif isinstance(value, TUSUpload):
            content_type = value.metadata().get("content-type", content_type)
            with open(value.filepath, "rb") as f:
                data = f.read().decode("utf8")
        else:
            data = value
        value = RichTextValue(
            raw=html_parser.unescape(data),
            mimeType=content_type,
            outputMimeType=self.field.output_mime_type,
            encoding=encoding,
        )
        self.field.validate(value)
        return value

Nov 20 '22 21:11 me-kell

@me-kell this is something that we never implemented in plone.restapi. Therefore I'd consider this to be a feature request, not a bug. That being said, I'd be more than happy to include this feature in plone.restapi. The content negotiation mechanism allows us to handle multiple formats, not only JSON (which is the only supported option right now). Though, I'd expect this to require some effort to make it work...

Nov 22 '22 04:11 tisto

Well, firstly it's a bug: mismatch between data and content-type.

Next then is a missing feature, to improve the overall situation to handle different types in the field.

Nov 22 '22 08:11 jensens

This is for different reasons a bug.

When we get a value via RESTAPI we would expect that nothing changes when we set the same value again.

This is not the case with objects having a field with RichTextValue. See how to reproduce it below.

This problem concerns not only text/x-web-markdown, but all mime-types (text/x-web-textile, text/x-rst, etc.).

And affects all content types having fields with RichtTextValue (not only RichTextField) like Document, News Item etc and other Dexterity content types.

To reproduce it

Import and define some variables

import requests
from plone import api
from plone.app.textfield.value import RichTextValue

url = 'http://localhost:8081/Plone'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}

configure markup via plone.restapi

data = {
    "allowed_types": [{"title": "text/x-web-markdown", "token": "text/x-web-markdown"}],
    "default_type": {"title": "text/x-web-markdown", "token": "text/x-web-markdown"}
}
response = requests.patch(url + '/@controlpanels/markup', headers=headers, json=data, auth=('admin', 'admin'))

create a Document with mimeType="text/x-web-markdown" via plone.api

app._p_jar.sync()
obj = api.content.create(
    container=plone,
    type='Document',
    title='MyDocument3',
    mimeType='text/x-web-markdown',
    text=RichTextValue(
        raw="normal *italics* normal **bold** normal",
        mimeType="text/x-web-markdown",
        outputMimeType="text/x-html-safe",
        encoding='utf-8',
    )
)
transaction.commit()
print(f"[plone.api] old_text.raw={obj.text.raw}")

[plone.api] old_text.raw=normal *italics* normal **bold** normal

Get the created document via plone.restapi.

response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
old_text = response.json()['text']
print(f"[plone.restapi] old_text={old_text}")

[plone.restapi] old_text={
    'content-type': 'text/x-web-markdown', 
    'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>', 
    'encoding': 'utf-8'}

And simply patch the document with the same text value via plone.restapi.

response = requests.patch(
    url + '/' + obj.id,
    headers=headers,
    auth=('admin', 'admin'),
    json={'text': old_text}
)
response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
new_text = response.json()['text']
print(f"[plone.restapi] new_text={new_text}")

[plone.restapi] new_text={
    'content-type': 'text/x-web-markdown', 
    'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>', 
    'encoding': 'utf-8'}

The RESTAPI doesn't see any difference: old_text and new_text are the same.

But it has changed the value of the document:

app._p_jar.sync()
print(f"[plone.api] new_text.raw={obj.text.raw}")

[plone.api] new_text.raw=<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>

Nov 22 '22 10:11 me-kell

this is something that we never implemented in plone.restapi.

@tisto I'm not sure I understand what you are referring to.

Nov 22 '22 11:11 me-kell

Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.

Nov 22 '22 14:11 jensens

@tisto Are there any plans to fix this bug?

Jul 21 '23 10:07 me-kell

@me-kell I'm willing to review a proposal for how to fix it, and a PR to implement the fix. I won't prioritize fixing it myself, since I'm not working on any projects that use non-HTML rich text fields.

Jul 21 '23 16:07 davisagli

@me-kell what @davisagli and the others are saying is that if you need that fix, you implement it and send a PR with it.

Feb 24 '24 09:02 sneridagh

Sorry. But I would expect that such conceptional bug would have been rejected from the beginning.

IMHO an API should pass the simplest test ("you get back exactly what you put in") which is not the case here from the beginning.

I'd very much like to fix this bug if I had the time (and the know how).

I've described the bug as good as possible so that everyone can understand what, why and where the problem is.

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

Feb 24 '24 09:02 me-kell

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

@me-kell this comment is disrespectful, insulting, and imflamatory. It implies that the contributors to this package do not care about code quality. I strongly recommend you reconsider your words.

That unsavory business aside, a pull request is always welcome. I don't think anyone opposes adding support as you describe. The reality is that we are all volunteer, unpaid contributors to open source software. There is no single maintainer, and they often come and go. The Plone community members are the maintainers. As in all open source software projects, we add features or fix bugs because it affects our work, and do not fix them when they do not. The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time. When someone does have the time and desire to implement this proposal, only then will it get done. That is how open source software works.

Feb 24 '24 10:02 stevepiercy

@stevepiercy Sorry. No offence or disrespect intended at all.

Feb 24 '24 11:02 me-kell

The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time.

@stevepiercy No offence, but those are not "my specifications". I think it should be obvious that this issue is not an "enhancement" but a "bug". Like it or not. See @jensens comment above:

Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.

@stevepiercy May I ask the reason why did you remove the "bug" label? Was it a technical one?

Feb 24 '24 11:02 me-kell

The mismatch is a bug, still. To fix it on conceptual level is a feature.

Feb 25 '24 10:02 jensens

From my perspective as a maintainer:

It certainly seems like a reasonable request to have a way to get the richtext data in the raw format it is stored (especially since that is what you need in order to save changes).

I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.

We could add a new key raw that has the raw data. That would mean a few more bytes being transferred. If someone is concerned about that we could make it configurable in the request (with something like Accept: application/x-json-rawrichtext)

I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.

You are welcome to use the software we made available for free, and even contribute to it if you can find the time. The software is made available under a license that disclaims any warranty or guarantee of quality, and I'm sorry if that's inconvenient for you but I don't feel responsible for fixing this design oversight. I don't intend to be mean about that -- and I'm not offended by you asking -- but sometimes in the past I tried to take care of too much in an open source project and then risked burnout. So now I take care of a few things now and then when I have an interest, but I'll also not feel bad about leaving some of it for others.

Feb 28 '24 07:02 davisagli

@davisagli thank your for your proposal: adding a new key raw is surely a good first step.

The RichTextFieldDeserializer should also not overwrite raw with data a it is the case today:

https://github.com/plone/plone.restapi/blob/cb63e1ac4aff0570bedba0b88bceb31c34325a88/src/plone/restapi/deserializer/dxfields.py#L292

I'm not sure if it will suffice or whether other keys like e.g. outputMimeType would be also necessary.

Feb 28 '24 10:02 me-kell

I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.

You got me wrong. It's not about one or the other.

We first need to fix the bug to report the right mime-type (the output-type of the text-field, not the input-type).

As @me-kell said, currently it is currently

"text": {
  "content-type": "text/x-web-markdown",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

but correct is

"text": {
  "content-type": "text/html",
  "data": "<p>xyz <strong>bold</strong> normal</p>",
  "encoding": "utf8"
}

Code is here:

https://github.com/plone/plone.restapi/blob/cb63e1ac4aff0570bedba0b88bceb31c34325a88/src/plone/restapi/serializer/converters.py#L188

And it should be

"content-type": json_compatible(value.outputMimeType),

Then we can open a new issue about a feature to additional output the raw (input) data and it's mime-type.

Feb 28 '24 10:02 jensens

Actually the RichtTextValue object of the RichText field has two different attributes: raw and output with their corresponding mime type attributes: mimeType (for raw) and outputMimeType (for output).

Currently the RESTAPI returns output (actually output_relative_to) (as data) and mimeType (as conttent_type) which is the mimeType of raw.

And the RESTAPI currently do not allow to transparently write back data (to raw) because it parses data with html_parser.unescape.

Furthermore the RESTAPI allows to read and to change the mimeType (via content_type) but it doesn't allow to read nor to change the outputMimeType.

If the RESTAPI is meant to both read and write RichtText fields it should have read and write keys for both: raw and output (with its corresponding mime types).

If the RESTAPI only offers keys for the output attribute then it is not clear to me why to read the output attribute but write back to the raw attribute. The whole point of RichText and Transformation in Plone would then be obfuscated and unnecessary.

Feb 28 '24 11:02 me-kell

AFAICS following will solve this issue:

change the key content_type to return the outputMimeType (as @jensens proposed).
adding the keys raw_encoded (or raw as @davisagli proposed) and raw_content_type.

Here is the current (wrong) mapping:

`value`	`serializer`	`comment`
`raw`
`encoding`	`encoding`
`raw_encoded`
`mimeType`	`content_type`	this is raw mimeType!
`outputMimeType`
`output`
`output_relative_to`	`data`

and the proposed mapping:

`value`	`serializer`	`comment`
`raw`
`encoding`	`encoding`
`raw_encoded`	`raw_encoded`	new key
`mimeType`	`raw_content_type`	new key
`outputMimeType`	`content_type`	corrected key
`output`
`output_relative_to`	`data`

Following is the code to test the current situation:

from urllib.parse import urlparse

import requests

URL = 'https://classic.demo.plone.org'
AUTH = ('admin', 'admin')
API_PATH_PREFIX = '++api++'


def get_restapi_url(url):
    parsed_url = urlparse(url)
    restapi_url = parsed_url._replace(path=API_PATH_PREFIX + parsed_url.path).geturl()
    return restapi_url


result = requests.patch(
    urlparse(URL)._replace(path='@controlpanels/markup').geturl(),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
    json={
        "allowed_types": [
            {"title": "text/html", "token": "text/html"},
            {"title": "text/x-web-markdown", "token": "text/x-web-markdown"},
            {"title": "text/x-web-textile", "token": "text/x-web-textile"}
        ],
        "default_type": {"title": "text/html", "token": "text/html"},
        "markdown_extensions": [
            "markdown.extensions.fenced_code",
            "markdown.extensions.footnotes",
            "markdown.extensions.tables"
        ]
    },
    auth=AUTH
)

my_text = {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal", "encoding": "utf8"}
print(f"my_text: {my_text}")
# {'content-type': 'text/x-web-markdown', 'data': 'xyz **bold** normal', 'encoding': 'utf8'}

result_post = requests.post(
    urlparse(URL).geturl(),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
    json={
        "@type": "Document",
        "title": "MyDocument",
        "text": my_text
    },
    auth=AUTH
)

print(f"result_post_at_id: {result_post.json()['@id']}")
print(f"result_post_text: {result_post.json()['text']}")
# this is wrong because it returns
#     output (actually output_relative_to) (as data) and
#     mimeType (as conttent_type) which is the mimeType of raw.
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal</p>', 'encoding': 'utf8'}

result_patch = requests.patch(
    get_restapi_url(result_post.json()['@id']),
    headers={'Accept': 'application/json', 'Content-Type': 'application/json', 'Prefer': 'return=representation'},
    json={'text': {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal again", "encoding": "utf8"}},
    auth=AUTH
)

print(f"result_patch_text: {result_patch.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}

result_get = requests.get(
    get_restapi_url(result_patch.json()['@id']),
    headers={'Accept': 'application/json'},
    auth=AUTH
)
print(f"result_get_text: {result_get.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}

Feb 29 '24 10:02 me-kell

plone.restapi plone.restapi copied to clipboard

richttext fields set to `text/x-web-markdown` return HTML

To reproduce it

plone.restapi
plone.restapi copied to clipboard