sheetjs icon indicating copy to clipboard operation
sheetjs copied to clipboard

When converting json to text it is converting characters weirdly

Open Godrules500 opened this issue 2 years ago • 12 comments

When converting json data to a text or csv it is converting characters like ’ to ’

I see that changing it to use UTF will supposedly fix it according to other documents, but I only see the use case when reading the file and not writing/creating the document.

In your documentation it says that 'txt' is UTF-16, but "string" is UTF8. How would I go about changing that, or can I?

        var workSheet = XLSX.utils.json_to_sheet(results);
        var workBook = XLSX.utils.book_new();

        XLSX.utils.book_append_sheet(workBook, workSheet);
        var content = XLSX.write(workBook, { bookType: 'txt', type: 'string' });

Godrules500 avatar Mar 21 '22 15:03 Godrules500

XLSX.write(workBook, {bookType: "csv", type: "string"});   // CSV as a JS string
XLSX.utils.sheet_to_csv(workSheet);                        // CSV from a worksheet (JS string)

binary will encode as binary strings.

When using writeFile, CSV will generate the binary output and add the UTF8 BOM. TXT aligns with "UTF-16 Text" in Excel. These choices optimize for the common case of generating files that will be read by Excel.

SheetJSDev avatar Mar 21 '22 17:03 SheetJSDev

Ok, so I've tried sheet_to_txt and write{bookType:'txt'}) and both are rendering the same way. Now I will say, that when I filter and convert one line, it is rendering the ’ character correctly. HOWEVER, when I return all 10,000 lines of data, it is then returning ’. Is there a reason why a small subset of data renders correctly, but the full set of data replaces certain characters?

Godrules500 avatar Mar 21 '22 18:03 Godrules500

Does the same thing happen with CSV (using sheet_to_csv or write({bookType: "csv", type: "string"})?

In either case, can you share a sample? JSON.stringify(results) should be sufficient to test. If you can't share it publicly, email [email protected]

SheetJSDev avatar Mar 21 '22 18:03 SheetJSDev

test.txt

Here is the file. So weirdly I have an array of 10,000 lines. In trying to help get a smaller set of data I found that if I include 978-1232 it didn't work. If I took 979-5000 it worked.

So this file contains lines 978-1232. So in translation, the file on here is 1-255. So if I include line 1 in the sheet_to_txt it messes up the values. If I remove the first line, it works correctly.

Code:

var workSheet = XLSX.utils.json_to_sheet(results);
results = XLSX.utils.sheet_to_txt(workSheet);
return results;

Godrules500 avatar Mar 21 '22 19:03 Godrules500

Please test the following:

results = XLSX.utils.sheet_to_txt(workSheet, {type: "string"});

If this works, the issue can be resolved with a small patch to https://github.com/SheetJS/sheetjs/blob/master/bits/90_utils.js#L153 (feel free to submit a PR):

	if(typeof $cptable == 'undefined' || opts.type == 'string' || !opts.type) return s;

SheetJSDev avatar Mar 21 '22 19:03 SheetJSDev

Sadly it did not work and returned the same results :. Now I have pulled out the file since I'm working in a cloud system called Netsuite, so all I'm importing a file from either node_modules/xlsx/xlsx.js or node_modules/xlsx/dist/xlsx.extendedscript.js.

Godrules500 avatar Mar 21 '22 19:03 Godrules500

Is there a way to read the content of json_to_sheet? That way I could determine if the file is being incorrectly manipulated there?

Godrules500 avatar Mar 21 '22 19:03 Godrules500

The result is a plain JS object, you can directly inspect it. To get a specific cell, you can index with an Excel address:

var A1 = workSheet["A1"];

Please try using node_modules/xlsx/dist/xlsx.full.min.js or node_modules/xlsx/dist/xlsx.core.min.js (and be sure to pull the latest version from npm!). node_modules/xlsx/xlsx.js is designed for use in NodeJS.

Some small test https://jsfiddle.net/sheetjs/vmez8u1g/ shows the effect of the parameters. FF FE 19 20 is a binary encoded version with the BOM while 2019 corresponds to "\u2019" (’)

SheetJSDev avatar Mar 21 '22 19:03 SheetJSDev

Continuing with the weird, in sheet_to_csv I added this return out.slice(2, 265).join("") and it works, but with slice(0 or 1, 265) it didn't work.

Ok, I'll try that and see. When I tried importing it in requirejs it was causing some issues and I couldn't run it.

Godrules500 avatar Mar 21 '22 19:03 Godrules500

For whatever reason I cannot get it to load the xlsx.min.js nor xlsx.core.min.js to load using requirejs inside of Netsuite.

Godrules500 avatar Mar 21 '22 20:03 Godrules500

NetSuite support was verified working in 0.16.1 and there was no change to the RequireJS logic since then. Is there some reported error?

SheetJSDev avatar Mar 21 '22 20:03 SheetJSDev

Ok, I have the xlsx.full.min.js working and I'm still getting the same issue. So I'm not sure if the code is converting it to a different character or if it is the javascript version causing the issue. since the issue seems to be when converting out to a string.

Godrules500 avatar Mar 22 '22 13:03 Godrules500