sheetjs icon indicating copy to clipboard operation
sheetjs copied to clipboard

json_to_sheet hanging when using large JSON file

Open mcbergsma opened this issue 3 years ago • 4 comments

I've got a fairly large JSON file I'm trying to write to a sheet. The file is ~353MB, with a length (when stringified) of 361,514,182.

It just hangs. I can get it working by chunking the Object and writing it to different sheets, but I can't write it all to a single sheet. I've also tried appending the chunks to the bottom of the sheet, but that also fails once it get's to the "latter" chunks.

I've just tried moving from v0.17 to 0.18 to see if that helps, but also no luck.

Running Node 16.15.1, Windows 11. Any ideas or workarounds? It's annoying having to manually open the files and paste the sheets back together.

Perhaps also worth noting that since the move to 0.18, the filesize has gone from 37MB to 98MB.

mcbergsma avatar Jul 28 '22 16:07 mcbergsma

You might be running into https://bugs.chromium.org/p/v8/issues/detail?id=6696

Try passing the option dense: true to json_to_sheet or aoa_to_sheet.

SheetJSDev avatar Jul 30 '22 18:07 SheetJSDev

Thanks for the suggestion! Tried this let ws = XLSX.utils.json_to_sheet(fullResult, { dense: true }) but it still hangs. I don't even see a reference to dense in the docs! I moved back to v0.17.5 on account of the file size issue, is that an option for 0.18 only?

On Sat, Jul 30, 2022 at 2:12 PM SheetJSDev @.***> wrote:

You might be running into https://bugs.chromium.org/p/v8/issues/detail?id=6696

Try passing the option dense: true to json_to_sheet or aoa_to_sheet.

— Reply to this email directly, view it on GitHub https://github.com/SheetJS/sheetjs/issues/2750#issuecomment-1200268577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3TZY6K27TSTWUJFMJPGN3VWVWCVANCNFSM545Z3FZQ . You are receiving this because you authored the thread.Message ID: @.***>

mcbergsma avatar Jul 30 '22 19:07 mcbergsma

The worksheet representation has a long history. We compared the hash vs array of arrays back in 2012 and found that IE6 performed much better with the Object. V8 also performed better with the object approach until some major changes under the hood heavily penalized large objects.

After V8 regressed, with no sign of improvement from Google's side, we revisited the array of arrays approach. That's what dense enables. Changing the default representation will require a major version bump.

If you are using compression: true and seeing a jump in file size, it's because we fully switched the ZIP engine and the sliding window hasn't been optimized

Can you give a rough estimate for the number of rows in the object? If you could find a way to share the data, that would also be helpful

SheetJSDev avatar Jul 30 '22 19:07 SheetJSDev

Ahh IE6, to blame for so many things! ;)

In this particular case I really can't share the data because it's confidential, but I'll try and provide as much detail as possible outside the actual data.

It's a data pull from a number of data sources that are processed and put together as an array of objects. The most recent one, run just this afternoon is 269496 rows total - and array of that many objects.

While not every object contains every one of these keys, most do. I've changed the actual name in most cases, but this is a list of name and type:

UniqueID	number
Study Name	string
Modified By	string
Country	string
Language	string
Target	string
Category	string
Parent Category	string
Segment	string
Filter Type	string
Filter	string
Product	string
Description	string (longer)
Image URL	string
File Name	string
File Size	string
Tags	string
KnownTag	string
KeyTag	string
Population	number
Group	string
Status	string
Created	date
productId	number
result1	decimal
result2	decimal
result3	decimal
quad	string
index	number
keyscore	number
keyscore2	number
check	string

To be clear, I saw the jump in size (roughly x3) with compression: true in v0.18. I dropped back to .17 and it went back to normal size - approx 36MB.

mcbergsma avatar Aug 03 '22 01:08 mcbergsma

Got the same error on json_to_sheet function with 421060 rows and 20 columns. it stuck almost an hour with no output Any help?

MHamzaRajput avatar Oct 02 '22 06:10 MHamzaRajput

Is there any debug mode so that I can able to see any progress and where it is stuck?

MHamzaRajput avatar Oct 02 '22 06:10 MHamzaRajput

Please test with 0.19.0. Both json_to_sheet and aoa_to_sheet accept dense: true.

Without using zip compression: https://jsfiddle.net/Lphbt2qj/

const XLSX = require('xlsx'); // nodejs
const data = Array.from({length:421060}, (_,i) => Array.from({length:20}, (_,j) => j == 0 ? "" + i : i+j));
console.time('write');
const wb = XLSX.utils.book_new();
const ws = XLSX.utils.aoa_to_sheet(data, {dense: true});
XLSX.utils.book_append_sheet(wb, ws, "SheetJS");
XLSX.writeFile(wb, "issue2750.xlsx");
console.timeEnd('write');

takes 15 seconds to write a 265MB XLSX file.

Enabling compression: https://jsfiddle.net/et0pbrv8/

const XLSX = require('xlsx'); // nodejs
XLSX.CFB.utils.use_zlib(require('zlib')); // nodejs using native zlib
const data = Array.from({length:421060}, (_,i) => Array.from({length:20}, (_,j) => j == 0 ? "" + i : i+j));
console.time('write');
const wb = XLSX.utils.book_new();
const ws = XLSX.utils.aoa_to_sheet(data, {dense: true});
XLSX.utils.book_append_sheet(wb, ws, "SheetJS");
XLSX.writeFile(wb, "issue2750.xlsx");
console.timeEnd('write');

takes 22 seconds in NodeJS to generate a 30MB file, 22 seconds in Chrome to generate a 79MB file.

For exports in this scale, it is highly encouraged to generate a blob URL in a web worker and perform the download dance in the renderer thread. The docs site has an example

SheetJSDev avatar Oct 24 '22 03:10 SheetJSDev

json_to_sheet( data, { dense:true } ) hangs with 766734 rows. on 0.19.1

Unless there's a workaround or a fix for this issue, are the limitations mentioned somewhere?

Ddevon avatar Dec 13 '22 05:12 Ddevon

https://docs.sheetjs.com/docs/miscellany/errors#aw-snap-or-oops-an-error-has-occurred

That said, the current limits apply on a per-worksheet level. The limit is driven by how the worksheet is written (a large string is constructed).

For the maximum worksheet rows (1048576 rows), you can write 10 columns (https://jsfiddle.net/L462gdjy/ demo)

For sheets with a small number of strings repeated many times, bookSST reduces size by de-duplicating text using the shared string table.

SheetJSDev avatar Dec 13 '22 05:12 SheetJSDev