AMP character transformation problem
Steps required to reproduce the problem
- add article
- open up ?amp_markup=1
Expected Result
a working article
Actual Result
<p class="ia2amp-p">Da oggi la parrocchia di San Paolo di Adrano ha ufficialmente, per la prima volta, il suo oratorio ecclesiale. Un oratorio che non è âun generico luogo di aggregazione â ha scritto il parroco padre Nicola Petralia nella lettera consegnata alla famiglie â ma uno spazio in cui la comunità parrocchiale si fa carico dellâeducazione dei propri figliâ. Questa sera lâinaugurazione dei locali da parte del vescovo, mons. Salvatore Gristina, dopo la celebrazione eucaristica. I giovani della parrocchia, dunque, avranno ampi spazi e strutture anche per le attività ricreative, compreso il campetto, al quale adesso si potrà accedere dallâesterno della chiesa grazie alla scala in ferro realizzata in questi giorni e i cui lavori sono ancora in corso. Soddisfatto ovviamente il parroco don Nicola Petralia</p>
while when using ia_markup=1
<p>Da oggi la parrocchia di San Paolo di Adrano ha ufficialmente, per la prima volta, il suo oratorio ecclesiale. Un oratorio che non è “un generico luogo di aggregazione – ha scritto il parroco padre Nicola Petralia nella lettera consegnata alla famiglie – ma uno spazio in cui la comunità parrocchiale si fa carico dell’educazione dei propri figli”. Questa sera l’inaugurazione dei locali da parte del vescovo, mons. Salvatore Gristina, dopo la celebrazione eucaristica. I giovani della parrocchia, dunque, avranno ampi spazi e strutture anche per le attività ricreative, compreso il campetto, al quale adesso si potrà accedere dall’esterno della chiesa grazie alla scala in ferro realizzata in questi giorni e i cui lavori sono ancora in corso. Soddisfatto ovviamente il parroco don Nicola Petralia</p>
Version Info
- Plugin version: 4.0.5
- WordPress version: 4.8.3
- PHP version: 5.6.30
What is the encoding charset you're using into your WP and on your wordpress database? Is your instant articles charset working well?
These instructions will help you debug where you are getting translated into wrong charset:
On the file class-instant-articles-amp-markup.php these are the important lines:
// Transform the post to an Instant Article.
$adapter = new Instant_Articles_Post( $post );
$article = $adapter->to_instant_article();
$article_html = $article->render();
$amp = AMPArticle::create( $article_html, $properties );
echo $amp->render();
Check if the content here: $article_html has the correct charset and if the content here $amp->render() has a different charset.
- Instant articles charset is working well
- my charset is utf8_unicode_ci
unfortunately i'm not a php developer but this is the only error coming from that file:
PHP Warning: Invalid argument supplied for foreach() in /home4/user/public_html/wp-content/plugins/fb-instant-articles/class-instant-articles-amp-markup.php on line 143
Are you a developer or have any who can help you with? Do you have access to the server where your Wordpress is installed?
I'm not an experienced developer, I only know the basics of C and Java. I have access to the server where Wordpress is installed. I have updated the plugin to the latest version that came out today but nothing changed.
I reported the same issue in #879. In my case, the problem is happening on my production server, but not my development server. This would suggest it's either server-related, there's a plugin clash, or it's the wp-config settings. I'm happy to help with debugging.
To answer the diagnostic questions: Instant Articles works just fine.
From wp-config.php
define('DB_CHARSET', 'utf8mb4'); define('DB_COLLATE', '');
The database itself is utf8_general_ci All of the standard WordPress tables are utf8mb4_unicode_ci
If I use mb_detect_encoding(), both $article_html and $amp->render() return UTF-8.
But if I dump $article_html, the characters display properly. So something is happening within the render or create methods. I appreciate that's not your code. Do you have any suggestions?
I've done some more digging on this.
The problem is introducted by the following line in AMPArticle::create():
$document->loadHTML($instantArticle);
Prior to that line, $instantArticle looks like this:
‘Who said growing old had to be graceful?’
But if I output $document->saveHTML(), I get this:
‘Who said growing old had to be graceful?’
This is obviously outside your code. This StackExchange question looks like it might help, although I couldn't get any of the solutions to work: https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly
I'll create an issue over at https://github.com/facebook/facebook-instant-articles-sdk-extensions-in-php/
The problem can be prevented within the instant articles plugin, by ensuring that no extended characters are sent to the AMPArticle:create() method. For example, like this:
$article_html = $article->render();
$amp = AMPArticle::create( mb_convert_encoding( $article_html, 'HTML-ENTITIES', 'UTF-8' ), $properties );
echo $amp->render();
I don't know if that might cause problems with some characters, but it solves the problem for me.
This is seven months on with no progress. Could you at least add a filter before AMPArticle::create() so that we can bypass the issue by converting to entities?
$article_html = $article->render();
$amp = AMPArticle::create( apply_filters( 'instant_articles_amp_article_html', $article_html ), $properties );
echo $amp->render();
This issue has been marked stale because it has been open for 30 days with no activity. If there is no activity within 7 days, it will be closed. This is an automation to keep issues manageable and actionable and is not a comment on the quality of this issue nor on the work done so far. Closed issues are still valuable to the project and are available to be searched.