utf8.js icon indicating copy to clipboard operation
utf8.js copied to clipboard

Invalid continuation byte

Open romafederico opened this issue 8 years ago • 10 comments

macOS, Webstorm 2017.1, Reactjs

I'm receving a utf-8 encoded JSON, converting it to a string and then utf8.decode(str).

At some point I'm getting the error Invalid continuation byte. Is there a way in which I can find the byte that is causing this error? This error appears with some of the users of my DB, not all, and I need to compare them.

Thanks

romafederico avatar Jun 16 '17 06:06 romafederico

I am having the same issue. I'm trying to convert strings like that:

let test1 = 'Östliche'; // must be "Östliche"
let test2 = 'Neuwiesenstraße'; // must be "Neuwiesenstraße"

console.log(utf8.decode(test1));
console.log(utf8.decode(test2));
Error: Invalid continuation byte
    at Error (native)
    at readContinuationByte (I:\dev\importer\node_modules\utf8\utf8.js:131:9)
    at decodeSymbol (I:\dev\importer\node_modules\utf8\utf8.js:160:12)
    at Object.utf8decode [as decode] (I:\dev\importer\node_modules\utf8\utf8.js:206:33)
    at Object.<anonymous> (I:\dev\importer\import.js:18:18)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.runMain (module.js:590:10)
    at run (bootstrap_node.js:394:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:509:3
// german special characters
let test1 = "Ä"; // Ä fails
let test2 = "ä"; // ä passes
let test3 = "Ü"; // Ü fails
let test4 = "ü"; // ü passes
let test5 = "Ö"; // Ö fails
let test6 = "ö"; // ö passes
let test7 = "ß"; // ß fails

// other special characters
let test8 = "Á"; // Á passes
let test9 = "á"; // á passes

All lowercases pass the test all uppercases not, except "ß" there is no lower / uppercase in german. Tested some other special characters but they passed the test.

PitPanda1 avatar Jul 17 '17 10:07 PitPanda1

Similar issue with emojis, anybody has an idea on how to fix it (other than a try / catch cop out?)

davide-scalzo avatar Jan 18 '18 18:01 davide-scalzo

Similar issue, circumventing with a try catch block,

error:


Error: Invalid continuation byte
    at readContinuationByte (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:115:9)
    at decodeSymbol (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:156:12)
    at Object.utf8decode [as decode] (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:190:17)
    at try_to_utf8_decode (C:\Ampps\www\b5_revisited\b5_file_parser.js:104:16)
    at process_file (C:\Ampps\www\b5_revisited\b5_file_parser.js:146:13)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

this is an example of the input:

Wij de werkgroep “KREKEROCK “ organiseren al een paar jaar tijdens de kerstperiode, omdat deze periode zich ui tstekend leent om eens stil te staan bij al het leed in de wereld, het muziekfestival KREKEROCK.
De opbrengst is steeds integraal voor CADAATAN KORTEMARK.
CADAATAN KORTEMARK houdt zich vooral bezig met het verbeteren van de omstandigheden waarin kinderen in bepaalde schooltjes op de Filip ijnen de lessen volgen. De vereniging is vooral actief in het noorden van het eiland CEBU, meer bepaald in enkele barangay’s van SAN REMIGIO.

wouterdialogic avatar Apr 03 '18 14:04 wouterdialogic

Similar issue trying to convert the word "Información". Has anyone fixed this issue? I've been all day trying to solve this but I haven't found the solution :(

AlejaRo avatar Feb 04 '20 20:02 AlejaRo

@AlejaRo

console.log(utf8.encode('Información')); // => Información
console.log(utf8.decode(utf8.encode('Información'))); // => Información

Please show us a snippet I've surrounded it with a try/catch and it seems to work so far

according to the tests, this error is thrown when an invalid sequence is encountered

  • 3 bytes instead of 4 bytes
  • mix between unicode and hex sequences

https://github.com/mathiasbynens/utf8.js/blob/2ce09544b62f2a274dbcd249473c0986e3660849/tests/tests.js#L245

mboughaba avatar Feb 05 '20 06:02 mboughaba

this code is throwing sam e error :

utf8.decode( 'Simplified Chinese: æˆ‘ä»¬ä¸ºæˆ‘ä»¬åˆ›é€ çš„æ¯æ°ä½œçš„å¥‰çŒ®ç²¾ç¥žå’Œå†³å¿ƒåŠ å‰§æ¯ä¸ªGWT代表的激情。但更比任何其他特质,在我们的机会心脏的决定性特征是GWTç»é”€å•†è¡¥å¿è®¡åˆ’ã€‚æˆ‘ä»¬åˆ›å»ºäº†ä¸€ä¸ªæ¶ˆé™¤äº†ä»»ä½•é™åˆ¶ï¼Œé€Ÿåº¦é¢ ç°¸çš„æˆå‘˜è®¿é—®ä»–ä»¬èµšå–ä½£é‡‘å’Œå¥–é‡‘ä¸–ç•Œä¸Šç¬¬ä¸€ä¸ªè‡ªç”±æµåŠ¨çš„å¯å˜è–ªé…¬è®¡åˆ’ã€‚æˆ‘ä»¬æ¸…æ¥šçš„ç»é”€å•†å‹å¥½çš„è–ªé…¬è®¡åˆ’ï¼Œä½¿GWTä¸šåŠ¡çš„äººæ¥è¯´ï¼Œé‚£é‡Œçš„å¹³å‡å…¼èŒåˆ›ä¸šè€…çœŸæ­£æ‹¥æœ‰ä¸ºè‡ªå·±åˆ›é€ è´¢å¯Œï¼Œå¹¶ä¸Žä»–äººåˆ†äº«çš„æœºä¼šçš„æœºä¼šã€‚æˆ‘ä»¬æ„Ÿåˆ°è‡ªè±ªçš„æ˜¯æˆ‘ä»¬çš„é©å‘½è‡ªç”±æµåŠ¨çš„è–ªé…¬è®¡åˆ’æ¶ˆé™¤äº†ç›´é”€å…¶ä¸­åªæœ‰é¡¶çº§ç»é”€å•†çš„ç²¾è‹±èƒ½å¤Ÿå®žçŽ°è´¢åŠ¡ä¼Ÿå¤§çš„çŽ°çŠ¶ã€‚å…¬å¹³å’Œæ„å›¾æ˜¯æˆ‘ä»¬åšç”Ÿæ„çš„æ–¹å¼èƒŒåŽçš„é©±åŠ¨åŠ›å’ŒåŒºåˆ«ä½¿å¾—GWT公司之间在历史上最好的家庭为基础的和基于互联网的机会。', ),

balwinder4264 avatar Jun 11 '20 13:06 balwinder4264

+1 Having this issue with the letter "ß" in the string

MattChilders92 avatar Jul 07 '20 21:07 MattChilders92

Same here when utf8.decode('账单信息') returning Error: Invalid continuation byte. It should decode to 账单信息 , is the library having issues with code points representated in 3 bytes or more (like chinese and korean)?

sebastianDejoy avatar Jul 18 '20 06:07 sebastianDejoy

@romafederico

I'm receving a utf-8 encoded JSON, converting it to a string and then utf8.decode(str).

as you receive the utf-8 encoded JSON and store it into a string you get the string re-encoded in UCS2

decoding as it's utf8 raises an error as expectesd

paolobertani avatar Aug 23 '22 11:08 paolobertani

@PitPanda1

I am having the same issue. I'm trying to convert strings like that:

let test1 = 'Östliche'; // must be "Östliche"
let test2 = 'Neuwiesenstraße'; // must be "Neuwiesenstraße"

'Östliche' is not UTF-8

paolobertani avatar Aug 23 '22 11:08 paolobertani