ClosedXML
ClosedXML copied to clipboard
Pivot table - support groups fields
Do you want to request a feature or report a bug?
- [x] Bug
- [ ] Feature
- [ ] Question
Did you test against the latest CI build?
- [ ] Yes
- [x] No
If you answered No, please test with the latest development build first.
I cloned and ran locally via ClosedXML.Sandbox.
Version of ClosedXML
0.102.0
What is the current behavior?
Exception during load of a vendor provided spreadsheet.
System.ArgumentOutOfRangeException: The column 'xyz' does not appear in the source range. (Parameter 'sourceName')
at ClosedXML.Excel.XLPivotFields.Add(String sourceName, String customName)
at ClosedXML.Excel.XLPivotFields.Add(String sourceName)
at ClosedXML.Excel.XLWorkbook.LoadSpreadsheetDocument(SpreadsheetDocument dSpreadsheet)
at ClosedXML.Excel.XLWorkbook.LoadSheets(Stream stream)
at ClosedXML.Excel.XLWorkbook.Load(Stream stream)
at ClosedXML.Excel.XLWorkbook..ctor(Stream stream, LoadOptions loadOptions)
at ClosedXML.Excel.XLWorkbook..ctor(Stream stream)
...
What is the expected behavior or new feature?
Spreadsheet is loaded.
Is this a regression from the previous version?
Yes. Works in 0.101.0
Reproducibility
Code to reproduce problem: The exception happens during the load of an existing file with a lot of pivot tables and connections. I'm sorry but a code sample isnt really helpful here.
new XLWorkbook("file")
- [ ] I attached a sample spreadsheet. (You can drag files on to this issue)
I dont have enough Excel experience to recreate this issue myself and I cant attach the file from the vendor.
I did spend a bit troubleshooting in the code and noticed a couple things.
At the previous commit, there was a check that the field is available before trying to add it in several places:
https://github.com/ClosedXML/ClosedXML/blob/69e37ad5f9d936c67d259e46582374930a23dd5f/ClosedXML/Excel/XLWorkbook_Load.cs#L714-L720
I assume this is what avoided the exception that is currently being thrown in XLPivotFields.Add.
When i skipped the bad adds in troubleshooting, I also ran into an exception where wss is null in ParsePivotSourceReference:
https://github.com/ClosedXML/ClosedXML/blob/9cded9044050523f87567271d72b382505ec2c2e/ClosedXML/Excel/XLWorkbook_Load.cs#L913-L920
wss can be null with external sources so it needs to return early.
May be related/same as https://github.com/ClosedXML/ClosedXML/issues/783, also having that error with spurious pivotTable and especially apparently data only in xl/pivotCache/pivotCacheDefinitions[0-9].xml. Pivot tables where apparently there, but removed long ago, but cannot find a way to cleanly delete the pivot cache and I tried all the ways - it is probably, because the pivot tables themselve do not exist anymore.
Re-creating or running these macros or so to refresh the pivot cache kinda works and "just" makes the error return a new (non-existent) column/ArgumentOutOfRangeException.
I can confirm this error does not happen with v0.101.0 and is a regression/new in v0.102.0.
Repro workbook
using var wb = new XLWorkbook(@"c:\Temp\Issues\2130\ColumnDeletedFromWorksheetButNotFromPivotTable.xlsx");
ColumnDeletedFromWorksheetButNotFromPivotTable.xlsx
The root of the issue is that the source data in the workbook are stale. Pivot tables have their own copy of data they use to calculate and display pivot table (docs).
The problem is that ClosedXML doesn't keep this data and just uses the original data from worksheet. If you delete a column from the worksheet, data are gone even from pivot table. Basically, the current state is fundamentally built on sand and needs to be fixed.
- I need an abstraction of pivot cache (IXLPivotCache). That was added in 0.102 .
- I need to add shared string table, because pivot cache data references strings through SST. That was already merged to 0.103.
- I need to load data to the the cache from workbook and save them back to the workbook. TBD.
If you click refresh all on pivot table, the exception might go away. I will look if I can add some simple check to ignore pivot fields that are missing from source workbook.
Thanks. That reproducer Excel file works and I can reproduce the issue with that Excel with v0.102.0, but not 0.101. As such downgrading is a feasable workaround.
If you click refresh all on pivot table, the exception might go away.
Yeah and that did not (fully) work for them. The thing is somehow the pivot table in my XLSX file has somehow been deleted completely, while the cache remains (and in the XSLX/ZIP file I can see) and yes, I don't know how that could have happen, as I did not create that Excel file.
I will look if I can add some simple check to ignore pivot fields that are missing from source workbook.
IMHO that would be great. Or a simple LoadOption to ignore all pivot fields/stuff maybe, because one may not actually need it and disabling it could be the simplest solution (for now?).
The thing is somehow the pivot table in my XLSX file has somehow been deleted completely, while the cache remains (and in the XSLX/ZIP file I can see) and yes, I don't know how that could have happen, as I did not create that Excel file.
Note I have reproduced this again with my customer-provided Excel file (which I described above) and even in ClosedXML v0.100.3 up to the current stable one, it does fail with the same error.
That is interesting, because your ColumnDeletedFromWorksheetButNotFromPivotTable.xlsx is indeed different here and works with v0.101.0 as said. That means there has to be a different (older/additional) error, so should I open a new issue about that?
Resolved through #2178, #2182, #2183 and #2186. The original post didn't contain a test file, but pivot cache now loads records and uses pivot cache instead of data from workbook directly.
That is loading only.
Pivot table saving code doesn't work properly for pivot tables with deleted source (or some related reason). Saved files (example) without a valid reference to source data will have a empty pivot table (also shows an error on load because RefreshOnLoad is hardcoded to true in saving code likely because it's buggy and Excel saves us), though it is possible to remove and add fields to recreate the table because data are kept even with deleted source.
But it doesn't crash on load that is the sound of progress 🥇
Unfortunately it's still failing to load my vendor provided spreadsheet. It's a mess inside, believe me.
I don't need any of the data from the pivot tables, so I'm sorry that I'm adding additional work to handle my edge-case.
Should I open a new issue since this continues to not work for this file? And since I'm running into two different issues, let me know if I should open another issue to split them.
- Issue where the cacheSource type is external:
<pivotCacheDefinition ... backgroundQuery="1"
createdVersion="6" refreshedVersion="6" minRefreshableVersion="3" recordCount="0"
supportSubquery="1" supportAdvancedDrill="1">
<cacheSource type="external" connectionId="21" />
<cacheField ...
This one I think I know how to pretty easily fix. I pointed out in the original description how wss can be null in ParsePivotSourceReference in this instance. But with the latest source changes there's an additional check needed when it tries to fallback to find a similar pivot cache. If it'd help, I can submit a PR with these changes.
- Issue with a different definition. The cached records are missing a column that is defined in the definition. Digging into the xml, the column missing from the cache is defined like this:
<cacheField name="Months" numFmtId="0" databaseField="0">
<fieldGroup ...
The presence of the databaseField attribute seems to be the only difference with this field compared to others. This would be fixable by having ReadCacheFields continue when DatabaseField is not null but I have no idea the implications of such a change. I'm not sure how to proceed with this one.
Edit: I tried out continuing based on the presence of DatabaseField and it was a big mistake. It's needed later to load the column names and data. A different approach is needed 😄
@jahav sorry to ping you, but I assume you didnt get a prior notification
The pivot table is using group fields. ClosedXML has a long way before that is supported.
As far as databaseFields:
The fields with databaseField="0" are calculated fields and grouped fields. I know there is an ancient WIP PR in unknown state #885 for calculated fields, I havent even used grouped fields, much less done any work on them.
MS-OI29500 (that is basically a patch over ISO29500 how does Excel actually implement the OOXML and fixes a lot of bugs in the standard) says
2.1.790 Part 1 Section 18.10.1.77, r (PivotCache Record) a. The standard does not specify restriction on the number of child elements. In Office, the number of child elements shall be equal to the number of cacheField elements with the databaseField attribute equal to 1 or true.
Would it make sense in the current implementation to add a DatabaseField property (nullable bool) to XLPivotCacheValuesStats and set it during the reading of cached fields? It could be used to [temporarily] ignore loading of calculated and grouped fields (via checking for non-null) during the processing of column labels and values.
No. It makes no sense to work with advanced features when I don't even have proper representation of basic features.