fb-instant-articles icon indicating copy to clipboard operation
fb-instant-articles copied to clipboard

AMP character transformation problem

Open absentio opened this issue 8 years ago • 8 comments

Steps required to reproduce the problem

  1. add article
  2. open up ?amp_markup=1

Expected Result

a working article

Actual Result

  <p class="ia2amp-p">Da oggi la parrocchia di San Paolo di Adrano ha ufficialmente, per la prima volta, il suo oratorio ecclesiale. Un oratorio che non  è “un generico luogo di aggregazione – ha scritto il parroco padre Nicola Petralia nella lettera consegnata alla famiglie  – ma uno spazio in cui la comunità parrocchiale si fa carico dell’educazione dei propri figli”. Questa sera l’inaugurazione dei locali da parte del vescovo, mons. Salvatore Gristina, dopo la celebrazione  eucaristica. I giovani della parrocchia, dunque, avranno ampi spazi e strutture anche per le attività ricreative, compreso il campetto, al quale adesso si potrà accedere dall’esterno della chiesa grazie alla scala in ferro realizzata in questi giorni e i cui lavori sono ancora in corso. Soddisfatto ovviamente  il parroco don Nicola Petralia</p>

while when using ia_markup=1

<p>Da oggi la parrocchia di San Paolo di Adrano ha ufficialmente, per la prima volta, il suo oratorio ecclesiale. Un oratorio che non è “un generico luogo di aggregazione – ha scritto il parroco padre Nicola Petralia nella lettera consegnata alla famiglie – ma uno spazio in cui la comunità parrocchiale si fa carico dell’educazione dei propri figli”. Questa sera l’inaugurazione dei locali da parte del vescovo, mons. Salvatore Gristina, dopo la celebrazione eucaristica. I giovani della parrocchia, dunque, avranno ampi spazi e strutture anche per le attività ricreative, compreso il campetto, al quale adesso si potrà accedere dall’esterno della chiesa grazie alla scala in ferro realizzata in questi giorni e i cui lavori sono ancora in corso. Soddisfatto ovviamente il parroco don Nicola Petralia</p>

Version Info

  • Plugin version: 4.0.5
  • WordPress version: 4.8.3
  • PHP version: 5.6.30

absentio avatar Nov 24 '17 10:11 absentio

What is the encoding charset you're using into your WP and on your wordpress database? Is your instant articles charset working well?

These instructions will help you debug where you are getting translated into wrong charset: On the file class-instant-articles-amp-markup.php these are the important lines:

		// Transform the post to an Instant Article.
		$adapter = new Instant_Articles_Post( $post );
		$article = $adapter->to_instant_article();
		$article_html = $article->render();
		$amp = AMPArticle::create( $article_html, $properties );
		echo $amp->render();

Check if the content here: $article_html has the correct charset and if the content here $amp->render() has a different charset.

everton-rosario avatar Nov 24 '17 12:11 everton-rosario

  1. Instant articles charset is working well
  2. my charset is utf8_unicode_ci

unfortunately i'm not a php developer but this is the only error coming from that file: PHP Warning: Invalid argument supplied for foreach() in /home4/user/public_html/wp-content/plugins/fb-instant-articles/class-instant-articles-amp-markup.php on line 143

absentio avatar Nov 24 '17 13:11 absentio

Are you a developer or have any who can help you with? Do you have access to the server where your Wordpress is installed?

everton-rosario avatar Nov 24 '17 17:11 everton-rosario

I'm not an experienced developer, I only know the basics of C and Java. I have access to the server where Wordpress is installed. I have updated the plugin to the latest version that came out today but nothing changed.

absentio avatar Dec 04 '17 17:12 absentio

I reported the same issue in #879. In my case, the problem is happening on my production server, but not my development server. This would suggest it's either server-related, there's a plugin clash, or it's the wp-config settings. I'm happy to help with debugging.

To answer the diagnostic questions: Instant Articles works just fine.

From wp-config.php define('DB_CHARSET', 'utf8mb4'); define('DB_COLLATE', '');

The database itself is utf8_general_ci All of the standard WordPress tables are utf8mb4_unicode_ci

If I use mb_detect_encoding(), both $article_html and $amp->render() return UTF-8.

But if I dump $article_html, the characters display properly. So something is happening within the render or create methods. I appreciate that's not your code. Do you have any suggestions?

markbarnes avatar Feb 15 '18 22:02 markbarnes

I've done some more digging on this.

The problem is introducted by the following line in AMPArticle::create(): $document->loadHTML($instantArticle);

Prior to that line, $instantArticle looks like this: ‘Who said growing old had to be graceful?’

But if I output $document->saveHTML(), I get this: &acirc;&#128;&#152;Who said growing old had to be graceful?&acirc;&#128;&#153;

This is obviously outside your code. This StackExchange question looks like it might help, although I couldn't get any of the solutions to work: https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly

I'll create an issue over at https://github.com/facebook/facebook-instant-articles-sdk-extensions-in-php/

markbarnes avatar Feb 15 '18 23:02 markbarnes

The problem can be prevented within the instant articles plugin, by ensuring that no extended characters are sent to the AMPArticle:create() method. For example, like this:

$article_html = $article->render();
$amp = AMPArticle::create( mb_convert_encoding( $article_html, 'HTML-ENTITIES', 'UTF-8' ), $properties );
echo $amp->render();

I don't know if that might cause problems with some characters, but it solves the problem for me.

markbarnes avatar Feb 16 '18 09:02 markbarnes

This is seven months on with no progress. Could you at least add a filter before AMPArticle::create() so that we can bypass the issue by converting to entities?

$article_html = $article->render();
$amp = AMPArticle::create( apply_filters( 'instant_articles_amp_article_html', $article_html ), $properties );
echo $amp->render();

markbarnes avatar Sep 19 '18 09:09 markbarnes

This issue has been marked stale because it has been open for 30 days with no activity. If there is no activity within 7 days, it will be closed. This is an automation to keep issues manageable and actionable and is not a comment on the quality of this issue nor on the work done so far. Closed issues are still valuable to the project and are available to be searched.

github-actions[bot] avatar Oct 18 '22 00:10 github-actions[bot]