plone.restapi
plone.restapi copied to clipboard
richttext fields set to `text/x-web-markdown` return HTML
The REST API returns the HTML-Version of richttext fields even if the content-type
is set to text/x-web-markdown
.
I'd expect the stored markdown version (value.raw
) to be returned.
We send:
"text": {
"content-type": "text/x-web-markdown",
"data": "xyz **bold** normal",
"encoding": "utf8"
}
We get:
"text": {
"content-type": "text/x-web-markdown",
"data": "<p>xyz <strong>bold</strong> normal</p>",
"encoding": "utf8"
}
To reproduce it:
SITE_URL=https://6-classic.demo.plone.org
USER=manager:**** # <- set manager password here
cat <<EOF | tee config_markup.json
{
"allowed_types": [
{"title":"text/html","token": "text/html"},
{"title": "text/x-web-markdown","token": "text/x-web-markdown"},
{"title": "text/x-web-textile","token": "text/x-web-textile"}
],
"default_type": {"title":"text/html","token": "text/html"},
"markdown_extensions":[
"markdown.extensions.fenced_code",
"markdown.extensions.footnotes",
"markdown.extensions.tables"
]
}
EOF
curl -i -X PATCH "${SITE_URL}/@controlpanels/markup" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
--data-binary "@my_document.json" \
--user ${USER}
cat <<EOF | tee my_document.json
{
"@type": "Document",
"title": "MyDocument",
"text": {
"content-type": "text/x-web-markdown",
"data": "xyz **bold** normal",
"encoding": "utf8"
}
}
EOF
ID=$(
curl -s -X POST "${SITE_URL}" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
--data-binary "@my_document.json" \
--user ${USER} \
| jq -r '."@id"' \
)
echo $ID
curl -s -X GET "${ID}" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
--user ${USER} \
| jq -r '."text"'
It returns
...
"text": {
"content-type": "text/x-web-markdown",
"data": "<p>xyz <strong>bold</strong> normal</p>",
"encoding": "utf8"
}
...
IMO the bug is either that the content-type
is wrong or the data
is wrong, because both do not match.
Having a mode to get raw or rendered is kind of feature request then. For editing one want raw, for display maybe rendered.
see also https://community.plone.org/t/markdown-or-mdx-from-plone-rest-api
The problem seems to be in the inconsistent handling of RichTextValue
in the serializer and deserializer.
plone.app.textfield.value.RichTextValue
has the following properties:
-
raw
-
encoding
-
raw_encoded
-
mimeType
-
outputMimeType
-
output
-
output_relative_to
While the serializer classs plone.restapi.serializer.converters.RichtextDXContextConverter
only returns
-
output_relative_to
in keydata
-
mimeType
in keycontent_type
-
encoding
in keyencoding
The deserializer class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer
sets
-
raw
from keydata
throughhtml_parser.unescape
-
mimeType
from keycontent_type
-
output
fromself.field.output_mime_type
-
encoding
from keyencodding
The properties
-
raw_encoded
andoutputMimeType
are not handled by the serializing/deserializing process -
encoding
andmimeType
(in the keycontent-type
) are passed transparently by the serializing/deserializing process - The serializer returns
output_relative_to
in the keydata
while the deserializer puts the keydata
parsed withhtml_parser.unescape
intoraw
and setsoutputMimeType
toself.field.output_mime_type
. See table below
value |
serializer |
deserializer |
value |
---|---|---|---|
raw |
html_parser.unescape(data) |
raw |
|
encoding |
encoding |
encoding |
encoding |
raw_encoded |
raw_encoded |
||
mimeType |
content_type |
content_type |
mimeType |
outputMimeType |
self.field.output_mime_type |
outputMimeType |
|
output |
output |
||
output_relative_to |
data |
output_relative_to |
classs plone.restapi.serializer.converters.RichtextDXContextConverter
@adapter(IRichTextValue, IDexterityContent)
@implementer(IContextawareJsonCompatible)
class RichtextDXContextConverter:
def __init__(self, value, context):
self.value = value
self.context = context
def __call__(self):
value = self.value
output = value.output_relative_to(self.context)
return {
"data": json_compatible(output),
"content-type": json_compatible(value.mimeType),
"encoding": json_compatible(value.encoding),
}
class plone.restapi.deserializer.dxfields.RichTextFieldDeserializer
@implementer(IFieldDeserializer)
@adapter(IRichText, IDexterityContent, IBrowserRequest)
class RichTextFieldDeserializer(DefaultFieldDeserializer):
def __call__(self, value):
content_type = self.field.default_mime_type
encoding = "utf8"
if isinstance(value, dict):
content_type = value.get("content-type", content_type)
encoding = value.get("encoding", encoding)
data = value.get("data", "")
elif isinstance(value, TUSUpload):
content_type = value.metadata().get("content-type", content_type)
with open(value.filepath, "rb") as f:
data = f.read().decode("utf8")
else:
data = value
value = RichTextValue(
raw=html_parser.unescape(data),
mimeType=content_type,
outputMimeType=self.field.output_mime_type,
encoding=encoding,
)
self.field.validate(value)
return value
@me-kell this is something that we never implemented in plone.restapi. Therefore I'd consider this to be a feature request, not a bug. That being said, I'd be more than happy to include this feature in plone.restapi. The content negotiation mechanism allows us to handle multiple formats, not only JSON (which is the only supported option right now). Though, I'd expect this to require some effort to make it work...
Well, firstly it's a bug: mismatch between data and content-type.
Next then is a missing feature, to improve the overall situation to handle different types in the field.
This is for different reasons a bug.
When we get a value via RESTAPI
we would expect that nothing changes when we set the same value again.
This is not the case with objects having a field with RichTextValue
. See how to reproduce it below.
This problem concerns not only text/x-web-markdown
, but all mime-types (text/x-web-textile
, text/x-rst
, etc.).
And affects all content types having fields with RichtTextValue
(not only RichTextField
) like Document
, News Item
etc and other Dexterity content types.
To reproduce it
Import and define some variables
import requests
from plone import api
from plone.app.textfield.value import RichTextValue
url = 'http://localhost:8081/Plone'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
configure markup via plone.restapi
data = {
"allowed_types": [{"title": "text/x-web-markdown", "token": "text/x-web-markdown"}],
"default_type": {"title": "text/x-web-markdown", "token": "text/x-web-markdown"}
}
response = requests.patch(url + '/@controlpanels/markup', headers=headers, json=data, auth=('admin', 'admin'))
create a Document with mimeType="text/x-web-markdown"
via plone.api
app._p_jar.sync()
obj = api.content.create(
container=plone,
type='Document',
title='MyDocument3',
mimeType='text/x-web-markdown',
text=RichTextValue(
raw="normal *italics* normal **bold** normal",
mimeType="text/x-web-markdown",
outputMimeType="text/x-html-safe",
encoding='utf-8',
)
)
transaction.commit()
print(f"[plone.api] old_text.raw={obj.text.raw}")
[plone.api] old_text.raw=normal *italics* normal **bold** normal
Get the created document via plone.restapi
.
response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
old_text = response.json()['text']
print(f"[plone.restapi] old_text={old_text}")
[plone.restapi] old_text={
'content-type': 'text/x-web-markdown',
'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>',
'encoding': 'utf-8'}
And simply patch the document with the same text
value via plone.restapi
.
response = requests.patch(
url + '/' + obj.id,
headers=headers,
auth=('admin', 'admin'),
json={'text': old_text}
)
response = requests.get(url + '/' + obj.id, headers=headers, auth=('admin', 'admin'))
new_text = response.json()['text']
print(f"[plone.restapi] new_text={new_text}")
[plone.restapi] new_text={
'content-type': 'text/x-web-markdown',
'data': '<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>',
'encoding': 'utf-8'}
The RESTAPI
doesn't see any difference: old_text
and new_text
are the same.
But it has changed the value of the document:
app._p_jar.sync()
print(f"[plone.api] new_text.raw={obj.text.raw}")
[plone.api] new_text.raw=<p>normal <em>italics</em> normal <strong>bold</strong> normal</p>
this is something that we never implemented in plone.restapi.
@tisto I'm not sure I understand what you are referring to.
Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.
@tisto Are there any plans to fix this bug?
@me-kell I'm willing to review a proposal for how to fix it, and a PR to implement the fix. I won't prioritize fixing it myself, since I'm not working on any projects that use non-HTML rich text fields.
@me-kell what @davisagli and the others are saying is that if you need that fix, you implement it and send a PR with it.
Sorry. But I would expect that such conceptional bug would have been rejected from the beginning.
IMHO an API should pass the simplest test ("you get back exactly what you put in") which is not the case here from the beginning.
I'd very much like to fix this bug if I had the time (and the know how).
I've described the bug as good as possible so that everyone can understand what, why and where the problem is.
I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.
I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.
@me-kell this comment is disrespectful, insulting, and imflamatory. It implies that the contributors to this package do not care about code quality. I strongly recommend you reconsider your words.
That unsavory business aside, a pull request is always welcome. I don't think anyone opposes adding support as you describe. The reality is that we are all volunteer, unpaid contributors to open source software. There is no single maintainer, and they often come and go. The Plone community members are the maintainers. As in all open source software projects, we add features or fix bugs because it affects our work, and do not fix them when they do not. The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time. When someone does have the time and desire to implement this proposal, only then will it get done. That is how open source software works.
@stevepiercy Sorry. No offence or disrespect intended at all.
The reason it does not get implemented to your specifications is because no one else has the need to fix it at this time.
@stevepiercy No offence, but those are not "my specifications". I think it should be obvious that this issue is not an "enhancement" but a "bug". Like it or not. See @jensens comment above:
Agree, this is indeed more than a bug of wrong content-type, this is a bug on the conceptional API design level.
@stevepiercy May I ask the reason why did you remove the "bug" label? Was it a technical one?
The mismatch is a bug, still. To fix it on conceptual level is a feature.
From my perspective as a maintainer:
It certainly seems like a reasonable request to have a way to get the richtext data in the raw format it is stored (especially since that is what you need in order to save changes).
I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data
is converted to the output mimetype. So I don't think we can simply change it to use the raw value.
We could add a new key raw
that has the raw data. That would mean a few more bytes being transferred. If someone is concerned about that we could make it configurable in the request (with something like Accept: application/x-json-rawrichtext
)
I'd suggest to ask the original developer and/or the maintainer to take this responsibility on code quality.
You are welcome to use the software we made available for free, and even contribute to it if you can find the time. The software is made available under a license that disclaims any warranty or guarantee of quality, and I'm sorry if that's inconvenient for you but I don't feel responsible for fixing this design oversight. I don't intend to be mean about that -- and I'm not offended by you asking -- but sometimes in the past I tried to take care of too much in an open source project and then risked burnout. So now I take care of a few things now and then when I have an interest, but I'll also not feel bad about leaving some of it for others.
@davisagli thank your for your proposal: adding a new key raw
is surely a good first step.
The RichTextFieldDeserializer should also not overwrite raw
with data
a it is the case today:
https://github.com/plone/plone.restapi/blob/cb63e1ac4aff0570bedba0b88bceb31c34325a88/src/plone/restapi/deserializer/dxfields.py#L292
I'm not sure if it will suffice or whether other keys like e.g. outputMimeType
would be also necessary.
I don't really care whether it's considered a bug or a feature but I do care about not introducing a breaking change for existing systems that assume data is converted to the output mimetype. So I don't think we can simply change it to use the raw value.
You got me wrong. It's not about one or the other.
We first need to fix the bug to report the right mime-type (the output-type of the text-field, not the input-type).
As @me-kell said, currently it is currently
"text": {
"content-type": "text/x-web-markdown",
"data": "<p>xyz <strong>bold</strong> normal</p>",
"encoding": "utf8"
}
but correct is
"text": {
"content-type": "text/html",
"data": "<p>xyz <strong>bold</strong> normal</p>",
"encoding": "utf8"
}
Code is here:
https://github.com/plone/plone.restapi/blob/cb63e1ac4aff0570bedba0b88bceb31c34325a88/src/plone/restapi/serializer/converters.py#L188
And it should be
"content-type": json_compatible(value.outputMimeType),
Then we can open a new issue about a feature to additional output the raw (input) data and it's mime-type.
Actually the RichtTextValue
object of the RichText
field has two different attributes: raw
and output
with their corresponding mime type attributes: mimeType
(for raw
) and outputMimeType
(for output
).
Currently the RESTAPI returns output
(actually output_relative_to
) (as data
) and mimeType
(as conttent_type
) which is the mimeType of raw
.
And the RESTAPI currently do not allow to transparently write back data
(to raw
) because it parses data
with html_parser.unescape
.
Furthermore the RESTAPI allows to read and to change the mimeType
(via content_type
) but it doesn't allow to read nor to change the outputMimeType
.
If the RESTAPI is meant to both read and write RichtText fields it should have read and write keys for both: raw
and output
(with its corresponding mime types).
If the RESTAPI only offers keys for the output
attribute then it is not clear to me why to read the output
attribute but write back to the raw
attribute. The whole point of RichText
and Transformation
in Plone would then be obfuscated and unnecessary.
AFAICS following will solve this issue:
- change the key
content_type
to return theoutputMimeType
(as @jensens proposed). - adding the keys
raw_encoded
(orraw
as @davisagli proposed) andraw_content_type
.
Here is the current (wrong) mapping:
value |
serializer |
comment |
---|---|---|
raw |
||
encoding |
encoding |
|
raw_encoded |
||
mimeType |
content_type |
this is raw mimeType! |
outputMimeType |
||
output |
||
output_relative_to |
data |
and the proposed mapping:
value |
serializer |
comment |
---|---|---|
raw |
||
encoding |
encoding |
|
raw_encoded |
raw_encoded |
new key |
mimeType |
raw_content_type |
new key |
outputMimeType |
content_type |
corrected key |
output |
||
output_relative_to |
data |
Following is the code to test the current situation:
from urllib.parse import urlparse
import requests
URL = 'https://classic.demo.plone.org'
AUTH = ('admin', 'admin')
API_PATH_PREFIX = '++api++'
def get_restapi_url(url):
parsed_url = urlparse(url)
restapi_url = parsed_url._replace(path=API_PATH_PREFIX + parsed_url.path).geturl()
return restapi_url
result = requests.patch(
urlparse(URL)._replace(path='@controlpanels/markup').geturl(),
headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
json={
"allowed_types": [
{"title": "text/html", "token": "text/html"},
{"title": "text/x-web-markdown", "token": "text/x-web-markdown"},
{"title": "text/x-web-textile", "token": "text/x-web-textile"}
],
"default_type": {"title": "text/html", "token": "text/html"},
"markdown_extensions": [
"markdown.extensions.fenced_code",
"markdown.extensions.footnotes",
"markdown.extensions.tables"
]
},
auth=AUTH
)
my_text = {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal", "encoding": "utf8"}
print(f"my_text: {my_text}")
# {'content-type': 'text/x-web-markdown', 'data': 'xyz **bold** normal', 'encoding': 'utf8'}
result_post = requests.post(
urlparse(URL).geturl(),
headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
json={
"@type": "Document",
"title": "MyDocument",
"text": my_text
},
auth=AUTH
)
print(f"result_post_at_id: {result_post.json()['@id']}")
print(f"result_post_text: {result_post.json()['text']}")
# this is wrong because it returns
# output (actually output_relative_to) (as data) and
# mimeType (as conttent_type) which is the mimeType of raw.
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal</p>', 'encoding': 'utf8'}
result_patch = requests.patch(
get_restapi_url(result_post.json()['@id']),
headers={'Accept': 'application/json', 'Content-Type': 'application/json', 'Prefer': 'return=representation'},
json={'text': {"content-type": "text/x-web-markdown", "data": "xyz **bold** normal again", "encoding": "utf8"}},
auth=AUTH
)
print(f"result_patch_text: {result_patch.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}
result_get = requests.get(
get_restapi_url(result_patch.json()['@id']),
headers={'Accept': 'application/json'},
auth=AUTH
)
print(f"result_get_text: {result_get.json()['text']}")
# {'content-type': 'text/x-web-markdown', 'data': '<p>xyz <strong>bold</strong> normal again</p>', 'encoding': 'utf8'}