xlsx-populate
xlsx-populate copied to clipboard
Exported workbook file size increases drastically with styles
Hi,
Thanks for this fantastic module.
I'm using it for exporting a database to xlsx in a node app. The data I work with is about 900 rows and 70 columns. The exported file size is about 400KB if I leave the plain cell style, but about 1,250KB if I apply colors (fill) and borders to them. Is that to be expected? If I open the plain version and apply similar styles directly in LibreOffice Calc, the file size is even reduced to about 350KB.
I'm using v1.19
Thanks
I looked into the code, applying style to m*n
cells will create m*n
styles, that is why the file size is large. This module is not yet optimized for setting a range of styles.
Hi, thanks for your answer, but I'm not sure I understand if I can do something about the problem. In order to set the styles, I'm doing things like
sheet.row(1).style("topBorderColor", 'ffe900');
sheet.column("AA").style("fill", 'ccffd8');
Is there a way I can optimise this? Or will there be a fix?
Thanks
Update: I thought the problem were the colours of the cells, but removing them didn't provide much of an improvement. Removing the border did. The issue is pretty much gone once there is no border.
Unfortunately, the current style implementation isn't very efficient. It creates separate style objects per cell. Excel and OO will combine styles that are the same for multiple cells into single instances.
@dtjohnson Could you provide some links in regards to the specification or a high-level overview I could follow to begin building this optimization? I believe, with large workbooks, this is preventing excel from opening my files after I have applied styles. /shrug
@LesterLyu If I created a Style object manually and provided it to cells usingcell.style()
call, would they use / reference the same style object?
In Cell.js
.case('Style', style => {
this._style = style;
this._styleId = style.id();
return this;
})
I believe I've built an MVP solution for the individual styles per cell problem. I'll explain below and if you'd like me to continue with this approach, or similar, I'd be happy to commit back :)
In my use case, I have alot of large range copies / row copies ( values and styles, not formulas)
I built a file that will take in a given StyleSheet, which should belong to the destination WorkBook and it will take in a styleProperties object ( retrieved from a cell with cell.styles([...])
) and return an instance of a Style object.
If I have seen the request for this style properties object before ( by checking for deep equality ) I will simply return the earlier created style instance ( which has it's associated id etc.. )
Afterwards when I preform the set operation with cell.style(Style)
the application just uses that ID as opposed to creating a new Style instance. ( Since I noticed that every cell has a style object, this should just change that reference? )
A potential major downside to this approach is that once cells styles have been pointed to the same style object reference, I expect that programmatically changing a cells style at the individual level may inadvertently affect all other cells who share that reference. ( This is a similar approach to exceljs seen here: https://github.com/exceljs/exceljs#styles )
Thinking to the future, these are some high level suggestions
- When initially parsing the xml, we can create linked style object references ( this will cut down file size bloat we see off the top )
- From the Cell, Range, and Row api perspective, we can allow a new style object created fresh at the cell level, to override an existing link as opposed to editing one
Here is my example work around solution that only created 12 additional style nodes instead of 950+ which I consider a slight MVP of these possible approaches:
import { isEqual } from "lodash";
const styleNodesByWorkbook = new WeakMap();
export function getStyleNode(styleProperties: any, StyleSheet: any) {
if (styleNodesByWorkbook.get(StyleSheet) === undefined) {
styleNodesByWorkbook.set(StyleSheet, []);
}
const createdStyleNodes = styleNodesByWorkbook.get(StyleSheet);
const matchedNode = createdStyleNodes.find((node: any) => {
return isEqual(styleProperties, node.styleProperties);
});
if (matchedNode) {
return matchedNode.styleNode;
}
// we need to create a new style object to match these
// style properties
const newStyleNode = StyleSheet.createStyle();
// set the properties on this style node;
Object.keys(styleProperties).forEach(property => {
const value = styleProperties[property];
newStyleNode.style(property, value);
});
createdStyleNodes.push({ styleProperties, styleNode: newStyleNode });
return newStyleNode;
}
And an example of how I use it
export function copyRanges(
sourceRange: any,
destRange: any,
destinationWorkbook: any
) {
const StyleSheet = destinationWorkbook._styleSheet;
sourceRange.forEach((cell: any, rowIndex: number, columnIndex: number) => {
const destCell = destRange.cell(rowIndex, columnIndex);
// get style object from current cell
const style = cell.style(["numberFormat", "fill", "fontColor"]);
const styleNode = getStyleNode(style, StyleSheet);
// set the values of the destination cell
destCell.value(cell.value());
// set the styles of the cell
destCell.style(styleNode);
});
EDIT:
To add to my initial analysis, when I took my file and doubled the number of rows & column
from ( 1013 x 38 ) to ( 2026 x 38 ) I was once again not able to open the generated .xlsx
file in excel even though with my new enchantment I only created 12 unique new style nodes.
I'm only guessing, but I suppose I can attribute that to individual style objects being created at the initial xml parse? But I can't be sure...
I believe your approach is not bad but I would keep the style per cell as it is with a little twist: when generating the xml
, I would create an upper level of unique styles where every cell style will point to. This way you can change individual cells and only at building the uniqueness is checked.
Hi @mdgozza and @dresende, I also would really like to see this issue fixed, is there any way I can help? I don't have knowledge of how xlsx works internally (except that it is a bunch of xmls)...
Also, do you know any other library that can handle this? (it does not have to be in Node, could be any other language)
This is a pandora box. What happens behind the scenes is that every time you style a range/cell you create a style that is stored, even it if exists already (with the same styling properties). I tried an approach on saving that would try to find duplicated styles and relink to only just one copy of each but ran out of time to work on this.
@dresende Thanks for the reply!
[...] every time you style a range/cell you create a style that is stored [...]
I am curious - What I did is just a few calls to sheet.usedRange().style()
(12 calls in fact, I can test with less later), with about 150 thousand cells in the range. From what you said, only 12 styles should be created, is 12 styles too much already?
It’s been a long while since I’ve looked at this.
@dresende your approach is definitely better, maybe one day I will look to implement it.
@papb I didn’t think you could apply styles to a range, rather only a cell? I’m only basing this assumption off my previous comment though.
12 styles should be fine, unless of course the code applies the style to each cell in the range. Which would give you 150,000 styles objects. You would have to review that code to be sure, or the output xml.
The best approach is to deduplicate style objects at build time of the xml. Walk the tree and find any matching styles and replace their reference to one style node, making sure to delete the original. I believe this is what OO does.
I am curious - What I did is just a few calls to
sheet.usedRange().style()
(12 calls in fact, I can test with less later), with about 150 thousand cells in the range. From what you said, only 12 styles should be created, is 12 styles too much already?
It's probably each cell individually that gets a style, I don't remember now.
I tried another approach but then I realised you could change any style in the mean time so to really make it scale I couldn't find a match when you apply a style but only when saving/creating the file. It shouldn't be too hard but since I took a substantial time understanding and reverse engineering the output, I had little time then. I since needed to move to another project and this was kept on hold. I really want to help but am out of time right now.
I have read through the thread. I don't have much knowledge of Excel formats, my info is from README.MD
of this project. Bunch of XML
...
If I missed the problem, sorry for the loss of your time.
As I read through I got an idea that might be able to solve this.
{ border: 'thin' } =! { border: 'thin'}
, but JSON.stringify({ border: 'thin'}) == JSON.stringify({ border: 'thin'})
!
Note that I use
JSON.stringify
, but here it is just a method to transform an object to a string.
This will however break if keys were defined in a different order! That could be mitigated if custom parsing would be used. One that orders keys of styles
alphabetically (, or any other defined way).
With a way to determine if 2 styles are equal, we can then check if the style is present and only change the styleID
of the cell to the one that already exists.
- Create
new Map<stringifiedStyle, StyleID>()
for stringified styles ('<style border="thin" fill="fff"' />
will do just fine!) - For every cell create style string (XML as string is fine)
- Check if this stringified style is in set of already added styles
- If yes, change
styleID
of cell to the new one. (cell.styleID = map.get(stringifiedStyles)
) - If no, add it to the map (
map.set(stringifiedStyles, nextStyleID++)
)
- Repeat for every cell
Hi @Akxe, I think your idea makes sense! I suggest using hash-obj instead of JSON.stringify
to make key order irrelevant. Would you be willing to submit a PR?
@papb I would be against it because hash-obj is node only library I would suggest using sort-keys and actually the JSON... :/
I could do the PR, but I am out of time. I will need it in a month, so if that is a reasonable horizon for you, I will do it about then... (hopefully :D)
@papb I would be against it because hash-obj is node only library I would suggest using sort-keys and actually the JSON... :/
Makes sense!!
I could do the PR, but I am out of time. I will need it in a month, so if that is a reasonable horizon for you, I will do it about then... (hopefully :D)
Awesome!
To anyone who's still struggling with this... If you only need a per-column style (which also caused a corrupt XLSX with around 1000 rows and 20 columns of data), there's a fairly simple solution. I created a "default" style which I'm using for each and every column in my XLSX, and extend the individual column styles from it. The code should be something like this:
const workbook = await XlsxPopulate.fromBlankAsync();
// Create a default template style to extend
const defaultStyle = (workbook as any).styleSheet().createStyle();
defaultStyle.style('wrapText', true);
defaultStyle.style('verticalAlignment', 'top');
// Iterate over columns
for(let j=0;j<20;j++ ) {
const column = workbook.sheet(0).column(j + 1);
// Create a new style for the given column
const columnStyle = (workbook as any).styleSheet().createStyle(defaultStyle);
// Adding styles from styleObj which is an object with props from the Style Guide
Object.keys(styleObj.style).forEach((s) => columnStyle.style(s, styleObj.style[s]));
// Set column style using a Style object, which will be used by reference -> no new style created
column.style(columnStyle);
}
This way, the internal styles.xml
definition keeps being a modest 12K file. Also, the final size of the assembled XLSX is reduced more than 50%.
For anyone still trying to work around this, I haven't been able to dedicate time to it, but I found a solution that works for me (20k rows table). It's a particular case but perhaps it works for you.
My tables have styles per column (actually a range, not full excel column), not per cell. But when defining the same style per cell (or range, it's just a shortcut), it will create a new style, even though it's the same as the cell above (same column). So what I do is the following:
Create the initial table headers, and as I'm navigating and creating headers, I create the style for the column and apply in the range of the rows. The difference here is that I create a style and pass it by reference. No need to update module.
let column_style = workbook.styleSheet().createStyle();
column_style.style("bold", true);
// whatever styles you want
sheet.range(row_start, col, row_end, col).style(column_style);
As I mentioned above, this works fine for a spreadsheet with a table with 10-20 columns and 20k rows.
Note that you can't change a cell style individually after this or you will affect all cells in that column. For that you need to make some kind of style reset to that cell and start over.
I analyzed code of @mdgozza and @Sleeper9, and I found more simple way.
const styleCachedMap = new Map();
function getStyleCached(styles) {
const cacheKey = JSON.stringify(styles);
let styleCached = styleCachedMap.get(cacheKey);
if (styleCached) {
return styleCached;
}
styleCached = book.styleSheet().createStyle();
for (const [key, value] of Object.entries(styles)) {
styleCached.style(key, value);
}
styleCachedMap.set(cacheKey, styleCached);
return styleCached;
}
const styleCached = getStyleCached(styles);
const cell = sheet.cell(addr);
// Use this code instead "cell.style(styleCached);" because it raise 'Unknown type: Style' error
cell._style = styleCached;
@dresende
let column_style = workbook.styleSheet().createStyle();
column_style.style("bold", true);
// whatever styles you want
sheet.range(row_start, col, row_end, col).style(column_style);
I tried the above solution and it is working on debug mode but I am getting "Unknown type: Style" error in production build. I am using xlsx-populate with next js. Any idea what could be the problem?
@zainalihashmi I use exactly that. I apply both to cells and cell ranges. I'm using version 1.21.0
@dresende
let column_style = workbook.styleSheet().createStyle(); column_style.style("bold", true); // whatever styles you want sheet.range(row_start, col, row_end, col).style(column_style);
I tried the above solution and it is working on debug mode but I am getting "Unknown type: Style" error in production build. I am using xlsx-populate with next js. Any idea what could be the problem?
Were you able to resolve this issue 'Unknown type: Style' ?
@manjunathr92 I could not resolve the issue with xlsx-populate. I moved to the exceljs and tested it with upto 5 million styled cells and it seems to be working fine.
@dresende
let column_style = workbook.styleSheet().createStyle(); column_style.style("bold", true); // whatever styles you want sheet.range(row_start, col, row_end, col).style(column_style);
I tried the above solution and it is working on debug mode but I am getting "Unknown type: Style" error in production build. I am using xlsx-populate with next js. Any idea what could be the problem?
Use _style instead of style. Refer my edited code: https://github.com/dtjohnson/xlsx-populate/issues/188#issuecomment-1598311616