fb-instant-articles
fb-instant-articles copied to clipboard
<p> tags in <blockquotes> get stripped
Steps required to reproduce the problem
- Use
<blockquote>
in "Text" tab. - Add a bunch of text with multiline carriages
- Close using
</blockquote>
// Like so...
<blockquote>I'm Derek, I got shot on a house raid in Iraq. My getting shot didn't make me a professional on war, international relations, house raids (obviously🤷🏻♀️), or guns. This horrible thing happened to me but I didn't come home and protest the war... Even though I do kinda have these feelings that Iraq was errr... Questionable. 😂 But it ain't my place. I did my part.
So, I empathize with you and your peers because whereas I signed up to be shot at, none of you did. (Not exactly sure you actually got shot AT but that's not the point here).
Now... Guns. Here's my two cents: It’s not a gun problem, not a people problem, it's a culture thing. Thing... Not problem. America loves guns. Accept that just like I had to accept that America loves God. Don't ever be so quick to tell a whoooole lot of people how to live.
</blockquote>
Expected Result
Wordpress wraps lines with <p>
, and InstantArticle uses the pre-defined PassThroughRule
to preserve <p>
I know that there's a pre-defined rule for blockquote p
so not sure why it's getting stripped out. (Also pasted our custom transformation rules at the bottom of ticket).
Actual Result
Source Markup
<blockquote><p>I’m Derek, I got shot on a house raid in Iraq. My getting shot didn’t make me a professional on war, international relations, house raids (obviously🤷🏻♀️), or guns. This horrible thing happened to me but I didn’t come home and protest the war… Even though I do kinda have these feelings that Iraq was errr… Questionable. 😂 But it ain’t my place. I did my part.</p>
<p>So, I empathize with you and your peers because whereas I signed up to be shot at, none of you did. (Not exactly sure you actually got shot AT but that’s not the point here).</p>
<p>Now… Guns. Here’s my two cents: It’s not a gun problem, not a people problem, it’s a culture thing. Thing… Not problem. America loves guns. Accept that just like I had to accept that America loves God. Don’t ever be so quick to tell a whoooole lot of people how to live.</p>
</blockquote>
Transformed Markup
<blockquote>I’m Derek, I got shot on a house raid in Iraq. My getting shot didn’t make me a professional on war, international relations, house raids (obviously🤷🏻♀️), or guns. This horrible thing happened to me but I didn’t come home and protest the war… Even though I do kinda have these feelings that Iraq was errr… Questionable. 😂 But it ain’t my place. I did my part.
So, I empathize with you and your peers because whereas I signed up to be shot at, none of you did. (Not exactly sure you actually got shot AT but that’s not the point here).
Now… Guns. Here’s my two cents: It’s not a gun problem, not a people problem, it’s a culture thing. Thing… Not problem. America loves guns. Accept that just like I had to accept that America loves God. Don’t ever be so quick to tell a whoooole lot of people how to live.
</blockquote>
Version Info
- Plugin version: 4.0.5
- WordPress version: 4.7.3
- PHP version: 5.6.36
Bonus screenshot :<
Bonus Transformation Rules
{
"rules": [{
"class": "CaptionRule",
"selector": "p.wp-caption-text"
},
{
"class": "ImageRule",
"selector": "div.wp-caption",
"properties": {
"image.url": {
"type": "string",
"selector": "img",
"attribute": "src"
},
"image.caption": {
"type": "element",
"selector": "p.wp-caption-text"
}
}
},
{
"class": "PassThroughRule",
"selector": "div.responsive-video"
},
{
"class": "InteractiveRule",
"selector": "div.jwp",
"properties": {
"interactive.iframe": {
"type": "multiple",
"children": [{
"type": "element",
"selector": "div.jwp"
}]
}
}
},
{
"class": "PassThroughRule",
"selector": ".gif_wrapper"
},
{
"class": "InteractiveRule",
"selector": "iframe",
"properties": {
"interactive.url": {
"type": "string",
"selector": "iframe",
"attribute": "src"
},
"interactive.width": {
"type": "int",
"selector": "iframe",
"attribute": "width"
},
"interactive.height": {
"type": "int",
"selector": "iframe",
"attribute": "height"
},
"interactive.iframe": {
"type": "children",
"selector": "iframe"
}
}
},
{
"class": "InteractiveRule",
"selector": "blockquote.twitter-tweet",
"properties": {
"interactive.iframe": {
"type": "multiple",
"children": [{
"type": "element",
"selector": "blockquote"
}, {
"type": "next-sibling-element-of",
"selector": "blockquote"
}]
}
}
}
]
}
Bonus live URL
Referencing to live article https://ijr.com/2018/03/1080748-veteran-iraq-david-hogg-rant/
Hey @stephdau. Thanks for your detailed description. I will take a look to see if I can reproduce the issue. I will keep you posted.
I was able to reproduce the problem and I could not find a set of rules to keep the paragraph.
I was checking the documentation and it is not clear whether rich HTML is allowed inside the blockquote. However, I ran a few tests creating the markup manually and the Instant Article kept the paragraphs inside.
I will go ahead and create an issue in the Instant Articles SDK repo since I believe it has to be addressed at that level. I will keep this one open, though.
Thx for checking this out!
This isn’t something that could be quickly changed in the transformer?
Sent from my iPhone
On Apr 5, 2018, at 9:58 AM, Pablo Estevez [email protected] wrote:
I was able to reproduce the problem and I could not find a set of rules to keep the paragraph.
I was checking the documentation and it is not clear whether rich HTML is allowed inside the blockquote. However, I ran a few tests creating the markup manually and the Instant Article kept the paragraphs inside.
I will go ahead and create an issue in the Instant Articles SDK repo since I believe it has to be addressed at that level. I will keep this one open, though.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I'd say it is likely a not too difficult change. However, it will have to be prioritized with the rest of the outstanding tasks.
I'm happy to review a pull request addressing this problem, though.
So from our experience the ban on blockquote>p
is part of the overall nesting intolerance inherent to IA. There's no way to have such nesting because of Rule Context (this doc is VERY hard to find OMG, why is it not linked from the docs of every element type?!?)
as the Transformer progresses, it uses the rules to build a hierarchy of transformed elements, giving context to each subsequent rule. Rules are only permitted to execute within an allowed context defined for the Rule Class is uses.
@pestevez It would be really great if the dev team for this plugin could master these concepts of Rules and Rule Context, as well as regularly deploying that document to people asking about transformations. They are extremely confusing for new people to learn, and when your team isn't clear about them either, you ensure that confusion persists.
The individual definition docs like this one for blockquote are pretty much useless, and don't give you any hint as to what can go inside nor what they can go inside, because they don't give any hints at the Permitted Contexts that their related Rule Class permits.
Instead the best doc to use in any situation where Transformations are relevant is the Available Rule Classes list in the Transformer Rules doc.
In the case of ParagraphRule
the only permitted context is InstantArticle
, which means, the only place you can put a <p>
is in the root of the document, outside any other HTML. They are completely forbidden to inhabit a <blockquote>
, another <p>
, or anything else. This applies to most Rule Classes, including all headings, which also can't be inside a <blockquote>
(a terrible restriction that kills accuracy in blockquotes).
The good news is that despite the Permitted Context system being complicated and totally invisible in most of the docs, using it is simple since pretty much all Rule Classes have only one Permitted Context, the default, and are forbidden everywhere else. So you just need to understand that most of the time your HTML won't work anywhere except where you see it used in the IA example code.
Intended Behavior: Do the opposite of Core WordPress
So the short answer for this particular question is that FB expects us to never have paragraphs inside of <blockquotes>
and this plugin correctly deletes them. INTENDED BEHAVIOR no matter how bad it is.
The horrible thing about this outcome is that it directly contradicts user expectations about how <blockquote>
and <p>
interact inside WP, since WP has always supported and, indeed, CREATES <p>
tags nested inside <blockqoute>
tags.
So your WP editor definitely makes you think it's OK to do this, then IA strips them out without asking and renders your carefully crafted, accurately-representing-their-source blockquotes as confusing single paragraphs.
Can this be fixed? There are two paths.
The doomed option is to do what I think would be best, and get Facebook to change their IA standard to allow paragraphs inside blockquotes. While we're at it, lets get them to allow headings and images in there too, since they are all useful and things that could be quoted, and IMHO seneseless limitations of the Transformations standard. Likelihood of happening: ~0 - Facebook must have known this would come up and made the standard strict anyway. I haven't seen any evidence they are listening to the needs of WP users in this way, so we should probably focus on considering option 2.
The viable solution is to filter all IA output and replace </p>
inside blockquotes with <br>
Why? Because unlike ParagraphRule
, LineBreakRule
(the IA rule for <br>
) has TextContainer
as it's Permitted Context, and this seems to include <blockquote>
as an option.
Sidenote, which of the different HTML/Rules qualify as each type of Context, (e.g. InstantArticle
and TextContainer
) doesn't seem to be documented anywhere! Trial and error definitions are the norm here when it comes to the IA standard. From testing LineBreakRule
works inside any text object, but not in the root of the article, so you can't have a <br>
outside a <p>
.
SO: If you remove each set of <p>
tags inside <blockquote>
, then add a <br>
between each of the previous paragraphs, it will be tolerated by the IA parser, and for the most part has a similar visual effect to the forbidden blockquote>p
.
Is this a good solution? Not at all. It's a PITA and at least for us, there was no clean solution that doesn't rely on REGEX that I know will wreak havoc one day (and indeed, we've already had to polish our REGEX several times based on different situations that have come up, including some devastating performance issues).
But based on the current situation it's the only option.
Working code to enable blockquote>p
Below is our code that works for us, and explains itself in the comments.
I don't know if the plugin devs would consider this a viable thing to include in the plugin, but other users with the audacity to think blockquote>p
is appropriate can consider inserting this code in their functions.php
file or in a custom plugin.
/**
* Filter for Facebook Instant Articles that cleans up blockquotes in the post content.
*
* This is necessary because Facebook Instant Articles limits what HTML markup you can
* use inside a blockquote. We use a regular expression to scan for blockquotes in the
* post content. We reconstruct each blockquote that we found so that it uses HTML
* compatible with the Facebook Instant Articles transformation rules.
*
* @param string $content
*
* @return string
*/
function github_ia_filter_instant_articles_clean_blockquotes($content) {
// Return immediately if content is empty.
if (empty($content)) {
return $content;
}
// RegEx: <blockquote([^>]*?)>
// Description: Checks for the opening blockquote tag and captures all its properties.
//
// RegEx: (.*?)
// Description: Captures the inside of the blockquote.
//
// RegEx: <\/blockquote>
// Description: Checks for the closing blockquote tag.
return preg_replace_callback('/<blockquote([^>]*?)>(.*?)<\/blockquote>/ism', function($blockquote_match) {
// Reconstruct original <blockquote> tag.
$blockquote = "<blockquote{$blockquote_match[1]}>";
// Remove all <p> tags and add a trailing <br /> tag instead.
$blockquote .= preg_replace_callback('/<p[^>]*?>(.*?)<\/p>/ism', function($paragraph_match) {
return $paragraph_match[1] . '<br />';
}, $blockquote_match[2]);
// Close <blockquote> tag
$blockquote .= '</blockquote>';
// Remove trailing <br /> tag
return str_replace('<br /></blockquote>', '</blockquote>', $blockquote);
}, $content);
}
add_filter('instant_articles_content', 'github_ia_filter_instant_articles_clean_blockquotes');
Whoa this is next level developer fanaticism. I like it!
What we’re doing now is exactly like what you said. Just using the “Text” tab, using blockquotes and manually inputing
tags to simulate different paragraphs.
This option shouldn’t be seen as anything more than a “hack.” It works but I’m pretty sure the writers really hate having to write HTML. I don’t blame them.
I guess it is what it is until Facebook allows
tags to have context inside
.I definitely think this bug should be highly prioritized because
are often multi-paragraphs. Sent from my iPhone
On Apr 5, 2018, at 4:56 PM, Jer Clarke [email protected] wrote:
So from our experience the ban on blockquote>p is part of the overall nesting intolerance inherent to IA. There's no way to have such nesting because of Rule Context (this doc is VERY hard to find OMG, why is it not linked from the docs of every element type?!?)
as the Transformer progresses, it uses the rules to build a hierarchy of transformed elements, giving context to each subsequent rule. Rules are only permitted to execute within an allowed context defined for the Rule Class is uses.
@pestevez It would be really great if the dev team for this plugin could master these concepts of Rules and Rule Context, as well as regularly deploying that document to people asking about transformations. They are extremely confusing for new people to learn, and when your team isn't clear about them either, you ensure that confusion persists.
The individual definition docs like this one for blockquote are pretty much useless, and don't give you any hint as to what can go inside nor what they can go inside, because they don't give any hints at the Permitted Contexts that their related Rule Class permits.
Instead the best doc to use in any situation where Transformations are relevant is the Available Rule Classes list in the Transformer Rules doc.
In the case of ParagraphRule the only permitted context is InstantArticle, which means, the only place you can put a
is in the root of the document, outside any other HTML. They are completely forbidden to inhabit a
, another, or anything else. This applies to most Rule Classes, including all headings, which also can't be inside a
(a terrible restriction that kills accuracy in blockquotes).The good news is that despite the Permitted Context system being complicated and totally invisible in most of the docs, using it is simple since pretty much all Rule Classes have only one Permitted Context, the default, and are forbidden everywhere else. So you just need to understand that most of the time your HTML won't work anywhere except where you see it used in the IA example code.
Intended Behavior: Do the opposite of Core WordPress
So the short answer for this particular question is that FB expects us to never have paragraphs inside of
and this plugin correctly deletes them. INTENDED BEHAVIOR no matter how bad it is. The horrible thing about this outcome is that it directly contradicts user expectations about how
andinteract inside WP, since WP has always supported and, indeed, CREATES
tags nested inside
tags. So your WP editor definitely makes you think it's OK to do this, then IA strips them out without asking and renders your carefully crafted, accurately-representing-their-source blockquotes as confusing single paragraphs.
Can this be fixed? There are two paths.
The doomed option is to do what I think would be best, and get Facebook to change their IA standard to allow paragraphs inside blockquotes. While we're at it, lets get them to allow headings and images in there too, since they are all useful and things that could be quoted, and IMHO seneseless limitations of the Transformations standard. Likelihood of happening: ~0 - Facebook must have known this would come up and made the standard strict anyway. I haven't seen any evidence they are listening to the needs of WP users in this way, so we should probably focus on considering option 2.
The viable solution is to filter all IA output and replace
inside blockquotes with
Why? Because unlike ParagraphRule , LineBreakRule (the IA rule for
) has TextContainer as it's Permitted Context, and this seems to includeas an option.Sidenote, which of the different HTML/Rules qualify as each type of Context, (e.g. InstantArticle and TextContainer) doesn't seem to be documented anywhere! Trial and error definitions are the norm here when it comes to the IA standard. From testing LineBreakRule works inside any text object, but not in the root of the article, so you can't have a
outside a.
SO: If you remove each set of
tags inside
, then add a
between each of the previous paragraphs, it will be tolerated by the IA parser, and for the most part has a similar visual effect to the forbidden blockquote>p.Is this a good solution? Not at all. It's a PITA and at least for us, there was no clean solution that doesn't rely on REGEX that I know will wreak havoc one day (and indeed, we've already had to polish our REGEX several times based on different situations that have come up, including some devastating performance issues).
But based on the current situation it's the only option.
Working code to enable blockquote>p
Below is our code that works for us, and explains itself in the comments.
I don't know if the plugin devs would consider this a viable thing to include in the plugin, but other users with the audacity to think blockquote>p is appropriate can consider inserting this code in their functions.php file or in a custom plugin.
/**
- Filter for Facebook Instant Articles that cleans up blockquotes in the post content.
- This is necessary because Facebook Instant Articles limits what HTML markup you can
- use inside a blockquote. We use a regular expression to scan for blockquotes in the
- post content. We reconstruct each blockquote that we found so that it uses HTML
- compatible with the Facebook Instant Articles transformation rules.
- @param string $content
- @return string */ function github_ia_filter_instant_articles_clean_blockquotes($content) { // Return immediately if content is empty. if (empty($content)) { return $content; }
// RegEx: <blockquote([^>]?)> // Description: Checks for the opening blockquote tag and captures all its properties. // // RegEx: (.?) // Description: Captures the inside of the blockquote. // // RegEx: </blockquote> // Description: Checks for the closing blockquote tag. return preg_replace_callback('/<blockquote([^>]?)>(.?)</blockquote>/ism', function($blockquote_match) { // Reconstruct original
tag. $blockquote = "<blockquote{$blockquote_match[1]}>";// Remove all <p> tags and add a trailing <br /> tag instead. $blockquote .= preg_replace_callback('/<p[^>]*?>(.*?)<\/p>/ism', function($paragraph_match) { return $paragraph_match[1] . '<br />'; }, $blockquote_match[2]); // Close <blockquote> tag $blockquote .= '</blockquote>'; // Remove trailing <br /> tag return str_replace('<br /></blockquote>', '</blockquote>', $blockquote);
}, $content); } add_filter('instant_articles_content', 'github_ia_filter_instant_articles_clean_blockquotes'); — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hey guys, here are my two cents:
- I agree we need to improve the documentation at multiple levels
- There are different types of docs:
- The Instant Articles format docs
- This defines the format itself, so the concept of Permitted Context does not have a meaning here
- One example of this type of doc is the definition of the blockquote. It should have defined the valid child elements, though.
- Facebook is in charge of updating this documentation.
- The IA PHP SDK docs
- This one documents an open source implementation of the format using concepts like the Permitted Context.
- It is based on the documentation of the IA format.
- The Facebook team that supports this library will update it, but we are also open to contributions from the community.
- Facebook is in charge of updating the documentation of this library that sits under https://developers.facebook.com/docs/instant-articles/sdk.
- The IA WordPress plugin docs
- Since the plugin is built on top of the IA PHP SDK, here the concept of the Permitted Context still applies.
- The Facebook team that supports this plugin will update it, but we are also open to contributions from the community.
- The Instant Articles format docs
- For this issue:
- It looks like the IA format already supports paragraphs as children of blockquote elements
- Then, we (the Facebook team) need to update the IA format docs so that everyone can benefit from it (not just those using the IA PHP SDK or the WordPress plugin)
- Once the docs are updated, we (anyone from either the Facebook team that supports the IA-related open source projects OR any developer from the community) need to reflect the change in the IA PHP SDK so that everyone using it (for example, those using the IA Drupal plugin) can benefit from it. https://github.com/facebook/facebook-instant-articles-sdk-php/issues/329
- Once the SDK is updated, we (anyone from either the Facebook team that supports the IA-related open source projects OR any developer from the community) need to update the IA WordPress plugin to use the new version of the SDK.
Thanks for the clarification @pestevez
The Instant Articles format docs This defines the format itself, so the concept of Permitted Context does not have a meaning here ... It looks like the IA format already supports paragraphs as children of blockquote elements
Okay that makes some sense I suppose, but obviously you understand why the existing docs are incoherent to users. The "Permitted Context" docs, regardless of whether they are in the SDK or in the "format" docs, are the only place where appropriate inheritance is broached, and thus the only reference for why this natural use of blockquote>p
fails in the plugin.
If blockquote>p
DOES work in IA, but not via the SDK, it is just as or MORE likely that the bug is in IA, and the intended standard is that it shouldn't work. Why else would the SDK have been coded this way?
One example of this type of doc is the definition of the blockquote. It should have defined the valid child elements, though.
This is the essence of the major need here. If there are restrictions on inheritance (obviously this is so, IA is basically made of restrictions) then they should be officially stated as part of the format first and formost, with the SDK implementing them. The SDK's version of "Permitted Context" should just be a redundant re-iteration of the official formats.
We agreed on these next steps and filed the tasks as needed. Thanks for the reports, @jerclarke . Also I'd like to welcome anyone who is interested in sending any PR regarding this.
I’m super bad at PHP... is there a JavaScript way to accomplish this?
On Apr 20, 2018, at 6:12 PM, Everton Rosario [email protected] wrote:
We agreed on these next steps and filed the tasks as needed. Thanks for the reports, @jerclarke . Also I'd like to welcome anyone who is interested in sending any PR regarding this.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
@everton-rosario Has this been fixed?
This issue has been marked stale because it has been open for 30 days with no activity. If there is no activity within 7 days, it will be closed. This is an automation to keep issues manageable and actionable and is not a comment on the quality of this issue nor on the work done so far. Closed issues are still valuable to the project and are available to be searched.