po2json icon indicating copy to clipboard operation
po2json copied to clipboard

Different output in some languages when there is semicolon in header

Open marusak opened this issue 5 years ago • 1 comments

There seems to be some inconsistency, about using semicolon at the end of plural-forms in .po headers. It's presence can break the format, that po2json produces. Let me explain with examples:

Let's have this file: (tmp.po)

  msgid ""                                                                        
  msgstr ""                                                                       
  "Project-Id-Version: PACKAGE VERSION\n"                                         
  "Language: ko\n"                                                                
  "MIME-Version: 1.0\n"                                                           
  "Content-Type: text/plain; charset=UTF-8\n"                                     
  "Content-Transfer-Encoding: 8bit\n"                                             
  "Plural-Forms: nplurals=1; plural=0\n"                                          
  "X-Generator: Weblate 3.10.1\n"                                                 
                                                                                  
  msgid "Combined usage of $0 CPU core"                                           
  msgid_plural "Combined usage of $0 CPU cores"                                   
  msgstr[0] "$0 CPU 코어의 총 사용량"

when I run ./node_modules/po2json/bin/po2json -p tmp.po tmp then tmp looks like this:

{                                                                               
   "": {                                                                        
      "project-id-version": "PACKAGE VERSION",                                  
      "language": "ko",                                                         
      "mime-version": "1.0",                                                    
      "content-type": "text/plain; charset=UTF-8",                              
      "content-transfer-encoding": "8bit",                                      
      "plural-forms": "nplurals=1; plural=0",                                   
      "x-generator": "Weblate 3.10.1"                                           
   },                                                                           
   "Combined usage of $0 CPU core": [                                           
      "Combined usage of $0 CPU cores",                                         
      "$0 CPU 코어의 총 사용량"                                                 
   ]                                                                            
}  

Which is correct. But let's now add semicolon to the "Plural-Forms: nplurals=1; plural=0\n" line. Now the tmp.po file looks like this:

  msgid ""                                                                        
  msgstr ""                                                                       
  "Project-Id-Version: PACKAGE VERSION\n"                                         
  "Language: ko\n"                                                                
  "MIME-Version: 1.0\n"                                                           
  "Content-Type: text/plain; charset=UTF-8\n"                                     
  "Content-Transfer-Encoding: 8bit\n"                                             
  "Plural-Forms: nplurals=1; plural=0;\n"                                         
  "X-Generator: Weblate 3.10.1\n"                                                 
                                                                                                                                     
  msgid "Combined usage of $0 CPU core"                                           
  msgid_plural "Combined usage of $0 CPU cores"                                   
  msgstr[0] "$0 CPU 코어의 총 사용량" 

and when I run the same command, the tmp output is:

{                                                                               
   "": {                                                                        
      "project-id-version": "PACKAGE VERSION",                                  
      "language": "ko",                                                         
      "mime-version": "1.0",                                                    
      "content-type": "text/plain; charset=UTF-8",                              
      "content-transfer-encoding": "8bit",                                      
      "plural-forms": "nplurals=1; plural=0;",                                  
      "x-generator": "Weblate 3.10.1"                                           
   },                                                                           
   "Combined usage of $0 CPU core": [                                           
      "Combined usage of $0 CPU cores",                                         
      [                                                                         
         "$0 CPU 코어의 총 사용량"                                              
      ]                                                                         
   ]                                                                            
} 

So the translation for the string is not array of strings, but array of one string and one array.

Interestingly enough, if I have different file, like this:

  msgid ""                                                                        
  msgstr ""                                                                       
  "Project-Id-Version: PACKAGE VERSION\n"                                         
  "Language: cs\n"                                                                
  "MIME-Version: 1.0\n"                                                           
  "Content-Type: text/plain; charset=UTF-8\n"                                     
  "Content-Transfer-Encoding: 8bit\n"                                             
  "Plural-Forms: nplurals=3; plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2\n"        
  "X-Generator: Weblate 3.10.1\n"                                                 
                                                                                  
  msgid "Combined usage of $0 CPU core"                                           
  msgid_plural "Combined usage of $0 CPU cores"                                   
  msgstr[0] "Kombinované využití $0 jádra procesoru"                              
  msgstr[1] "Kombinované využití $0 jader procesoru"                              
  msgstr[2] "Kombinované využití $0 jader procesoru"

The output is the same, no matter if there is semicolon or not on the Plural-Forms line. From docs it seems there always should be semicolon (1, 2). This is likely problem in some library that po2json uses, but was not sure where it really comes from, so reporting here. (side note: We had mix of some files having this semicolon and some don't for years and it seemed to work just fine. We were using Zanata to generate these files for us, now we migrated to Weblate and it adds this semicolon to some more languages (still not to all). So maybe this is known bug/documented somewhere)

marusak avatar Jan 17 '20 07:01 marusak

Most likely https://github.com/smhg/gettext-parser related.

hthetiot avatar Dec 15 '21 20:12 hthetiot