AzureStorageExplorer
AzureStorageExplorer copied to clipboard
Unable to preview xxx .parquet file
Preflight Checklist
- [X] I have installed the latest version of Storage Explorer.
- [X] I have checked existing resources, including the troubleshooting guide and the release notes.
- [X] I have searched for similar issues.
Storage Explorer Version
1.33.0
Regression From
1.32.1
Architecture
x64
Storage Explorer Build Number
20240301.4
Platform
Windows
OS Version
Windows 11
Bug Description
Since yesterday's update, I have been unable to preview specific parquet files. These parquet files are generated by Azure data factory and read by Synapse/Power BI without issue (see images below). I've tested older parquet files, and they all have the same issue with Azure Storage Explorer. File size is 2.93 MB.
This is not all files, just some parquet files.
I can provide sample files privately to Microsoft.
{ "name": "Error", "message": "Unable to preview '100/103/2024/03/01/trusted-100-103-daily-2024-03-01-measures-processedOn-2024-03-02.parquet'.", "stack": "Error: Unable to preview '100/103/2024/03/01/trusted-100-103-daily-2024-03-01-measures-processedOn-2024-03-02.parquet'.\n at fetchParquetData (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\node_modules\@storage-explorer\file-preview\dist\src\index.js:102:2408)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async FilePreview.fetchTabularData (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\node_modules\@storage-explorer\file-preview\dist\src\index.js:102:4366)\n at async Je.executeOperation (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:3:2742)\n at async Bt._handleExecuteRequest (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:7:915)\n at async Bt._handleMessage (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:6:30030)\n at async C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:6:27541" }
Steps to Reproduce
Launch AZ SE Navigate to a parquet file. Right click Preview
Actual Experience
Expected Experience
Additional Context
To get Azure Storage Explorer to open my parquet files again, I had to downgrade to version 1.31.2. I was running 1.32.1 before the upgrade, however.
Working in 1.31.2
File opens in Power BI without issue.
Schema
@lab44Hub would you be able to share a parquet file with us that you can preview in 1.32 but no in 1.33?
@MRayermannMSFT is there a secure way I can transfer this to you? I would prefer not to publish it on here.
@lab44Hub you can email it to the address sehelp, @, microsoft.com (sorry for weird formatting, trying to avoid scrapers getting the address lol). Please include this issue number in the subject line, something like: "GitHub Issue: 7807 - Parquet Preview Error"
@MRayermannMSFT Thanks, I've send the email with the file.
Seeing the same behavior on my end, with the same logs. I can open the file locally using tad and/or python. We're behind a corporate proxy.
I'm seeing this error when opening the file:
missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)
Looking at your schema while debugging in our code, all of the DECIMAL fields seem to be backed by FIXED_LEN_BYTE_ARRAY primitives rather than INT64 as indicated above by PowerBI. Do you know for sure the DECIMAL fields use INT64? If that's true, there may be a bug in the Parquet library we're using...
This reproduces for me in both 1.32.0 and 1.32.1, so I'm not sure if this is really a regression.
I'm not sure if this is really a regression.
I can't explain it. I have been running 1.32.1 since release, and this issue did not occur. I check these files almost daily using Storage Explorer. 1.33.0 was the first encounter with this issue. I now have to run 1.31.2 to preview this (daily) parquet file.
I can only assume that perhaps something in the upgrade vs. complete reinstall is different.
missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)
All I can say on this, is that the files are created by Azure Data Factory and before the SINK I'm converting to either string or decimal which is correctly picked up in the SINK.
@craxal I was able/not able to preview the file @lab44Hub sent in versions:
- 1.30.2: yes
- 1.31.2: yes
- 1.32.1: no
- 1.33.0: no
Can we look into what the regression was between 31 and 32?
I can also confirm that the file opens in 1.31.2 without issue.
The only significant change I can see between 1.31.2 and 1.32.0 is we patch-updated the Parquet library from 1.3.3 to 1.3.4 (interestingly enough, to solve problems with DECIMAL precision, see #7042, LibertyDSNP/parquetjs#91). The issue no longer occurs if I revert to 1.3.3. Unfortunately, the problem persists even with the current version (1.6.0).
@JasonYeMSFT From what I'm gathering, it looks like the 1.3.4 version included a contribution you made to support DECIMAL precisions higher than 18. Can you provide any insights or point to potential solutions?
I can see why it is broken. The library's typing suggests that the values of the metadata fields are either of some type or "undefined". However, this customer's parquet file uses "undefined" and "null" interchangeably. I will make a PR to the upstream library but this will be a long term pain because as long as the typing are still misleading.
I opened a PR in the upstream library https://github.com/LibertyDSNP/parquetjs/pull/122.
Looks like I was a bit mistaken. The contribution mentioned above actually fixes the issue, but the fix was not present until 1.6.1, which was just released. I've updated our library to that version, and I can confirm the file now parses and renders in the preview as before.
@lab44Hub We will have this fixed for 1.34.0. However, if other issues arise that warrant a 1.33.1 hotfix, we will make sure this fix is included in the hotfix.
Awesome response, thanks everyone.