sheetjs
sheetjs copied to clipboard
json_to_sheet hanging when using large JSON file
I've got a fairly large JSON file I'm trying to write to a sheet. The file is ~353MB, with a length (when stringified) of 361,514,182.
It just hangs. I can get it working by chunking the Object and writing it to different sheets, but I can't write it all to a single sheet. I've also tried appending the chunks to the bottom of the sheet, but that also fails once it get's to the "latter" chunks.
I've just tried moving from v0.17 to 0.18 to see if that helps, but also no luck.
Running Node 16.15.1, Windows 11. Any ideas or workarounds? It's annoying having to manually open the files and paste the sheets back together.
Perhaps also worth noting that since the move to 0.18, the filesize has gone from 37MB to 98MB.
You might be running into https://bugs.chromium.org/p/v8/issues/detail?id=6696
Try passing the option dense: true to json_to_sheet or aoa_to_sheet.
Thanks for the suggestion! Tried this let ws = XLSX.utils.json_to_sheet(fullResult, { dense: true }) but it still hangs.
I don't even see a reference to dense in the docs! I moved back to v0.17.5
on account of the file size issue, is that an option for 0.18 only?
On Sat, Jul 30, 2022 at 2:12 PM SheetJSDev @.***> wrote:
You might be running into https://bugs.chromium.org/p/v8/issues/detail?id=6696
Try passing the option dense: true to json_to_sheet or aoa_to_sheet.
— Reply to this email directly, view it on GitHub https://github.com/SheetJS/sheetjs/issues/2750#issuecomment-1200268577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3TZY6K27TSTWUJFMJPGN3VWVWCVANCNFSM545Z3FZQ . You are receiving this because you authored the thread.Message ID: @.***>
The worksheet representation has a long history. We compared the hash vs array of arrays back in 2012 and found that IE6 performed much better with the Object. V8 also performed better with the object approach until some major changes under the hood heavily penalized large objects.
After V8 regressed, with no sign of improvement from Google's side, we revisited the array of arrays approach. That's what dense enables. Changing the default representation will require a major version bump.
If you are using compression: true and seeing a jump in file size, it's because we fully switched the ZIP engine and the sliding window hasn't been optimized
Can you give a rough estimate for the number of rows in the object? If you could find a way to share the data, that would also be helpful
Ahh IE6, to blame for so many things! ;)
In this particular case I really can't share the data because it's confidential, but I'll try and provide as much detail as possible outside the actual data.
It's a data pull from a number of data sources that are processed and put together as an array of objects. The most recent one, run just this afternoon is 269496 rows total - and array of that many objects.
While not every object contains every one of these keys, most do. I've changed the actual name in most cases, but this is a list of name and type:
UniqueID number
Study Name string
Modified By string
Country string
Language string
Target string
Category string
Parent Category string
Segment string
Filter Type string
Filter string
Product string
Description string (longer)
Image URL string
File Name string
File Size string
Tags string
KnownTag string
KeyTag string
Population number
Group string
Status string
Created date
productId number
result1 decimal
result2 decimal
result3 decimal
quad string
index number
keyscore number
keyscore2 number
check string
To be clear, I saw the jump in size (roughly x3) with compression: true in v0.18. I dropped back to .17 and it went back to normal size - approx 36MB.
Got the same error on json_to_sheet function with 421060 rows and 20 columns. it stuck almost an hour with no output Any help?
Is there any debug mode so that I can able to see any progress and where it is stuck?
Please test with 0.19.0. Both json_to_sheet and aoa_to_sheet accept dense: true.
Without using zip compression: https://jsfiddle.net/Lphbt2qj/
const XLSX = require('xlsx'); // nodejs
const data = Array.from({length:421060}, (_,i) => Array.from({length:20}, (_,j) => j == 0 ? "" + i : i+j));
console.time('write');
const wb = XLSX.utils.book_new();
const ws = XLSX.utils.aoa_to_sheet(data, {dense: true});
XLSX.utils.book_append_sheet(wb, ws, "SheetJS");
XLSX.writeFile(wb, "issue2750.xlsx");
console.timeEnd('write');
takes 15 seconds to write a 265MB XLSX file.
Enabling compression: https://jsfiddle.net/et0pbrv8/
const XLSX = require('xlsx'); // nodejs
XLSX.CFB.utils.use_zlib(require('zlib')); // nodejs using native zlib
const data = Array.from({length:421060}, (_,i) => Array.from({length:20}, (_,j) => j == 0 ? "" + i : i+j));
console.time('write');
const wb = XLSX.utils.book_new();
const ws = XLSX.utils.aoa_to_sheet(data, {dense: true});
XLSX.utils.book_append_sheet(wb, ws, "SheetJS");
XLSX.writeFile(wb, "issue2750.xlsx");
console.timeEnd('write');
takes 22 seconds in NodeJS to generate a 30MB file, 22 seconds in Chrome to generate a 79MB file.
For exports in this scale, it is highly encouraged to generate a blob URL in a web worker and perform the download dance in the renderer thread. The docs site has an example
json_to_sheet( data, { dense:true } ) hangs with 766734 rows. on 0.19.1
Unless there's a workaround or a fix for this issue, are the limitations mentioned somewhere?
https://docs.sheetjs.com/docs/miscellany/errors#aw-snap-or-oops-an-error-has-occurred
That said, the current limits apply on a per-worksheet level. The limit is driven by how the worksheet is written (a large string is constructed).
For the maximum worksheet rows (1048576 rows), you can write 10 columns (https://jsfiddle.net/L462gdjy/ demo)
For sheets with a small number of strings repeated many times, bookSST reduces size by de-duplicating text using the shared string table.