AzureStorageExplorer icon indicating copy to clipboard operation
AzureStorageExplorer copied to clipboard

Unable to preview xxx .parquet file

Open damianvandoom opened this issue 1 year ago • 12 comments

Preflight Checklist

Storage Explorer Version

1.33.0

Regression From

1.32.1

Architecture

x64

Storage Explorer Build Number

20240301.4

Platform

Windows

OS Version

Windows 11

Bug Description

Since yesterday's update, I have been unable to preview specific parquet files. These parquet files are generated by Azure data factory and read by Synapse/Power BI without issue (see images below). I've tested older parquet files, and they all have the same issue with Azure Storage Explorer. File size is 2.93 MB.

This is not all files, just some parquet files.

I can provide sample files privately to Microsoft.

{ "name": "Error", "message": "Unable to preview '100/103/2024/03/01/trusted-100-103-daily-2024-03-01-measures-processedOn-2024-03-02.parquet'.", "stack": "Error: Unable to preview '100/103/2024/03/01/trusted-100-103-daily-2024-03-01-measures-processedOn-2024-03-02.parquet'.\n at fetchParquetData (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\node_modules\@storage-explorer\file-preview\dist\src\index.js:102:2408)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async FilePreview.fetchTabularData (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\node_modules\@storage-explorer\file-preview\dist\src\index.js:102:4366)\n at async Je.executeOperation (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:3:2742)\n at async Bt._handleExecuteRequest (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:7:915)\n at async Bt._handleMessage (C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:6:30030)\n at async C:\Users\xxx\AppData\Local\Programs\Microsoft Azure Storage Explorer\resources\app\out\app\node\NodeProcessHostProxy.js:6:27541" }

Steps to Reproduce

Launch AZ SE Navigate to a parquet file. Right click Preview

Actual Experience

image

Expected Experience

image

Additional Context

To get Azure Storage Explorer to open my parquet files again, I had to downgrade to version 1.31.2. I was running 1.32.1 before the upgrade, however.

Working in 1.31.2 image

File opens in Power BI without issue. image

Schema

image

damianvandoom avatar Mar 08 '24 09:03 damianvandoom

@lab44Hub would you be able to share a parquet file with us that you can preview in 1.32 but no in 1.33?

MRayermannMSFT avatar Mar 08 '24 15:03 MRayermannMSFT

@MRayermannMSFT is there a secure way I can transfer this to you? I would prefer not to publish it on here.

damianvandoom avatar Mar 08 '24 15:03 damianvandoom

@lab44Hub you can email it to the address sehelp, @, microsoft.com (sorry for weird formatting, trying to avoid scrapers getting the address lol). Please include this issue number in the subject line, something like: "GitHub Issue: 7807 - Parquet Preview Error"

MRayermannMSFT avatar Mar 08 '24 18:03 MRayermannMSFT

@MRayermannMSFT Thanks, I've send the email with the file.

damianvandoom avatar Mar 11 '24 10:03 damianvandoom

Seeing the same behavior on my end, with the same logs. I can open the file locally using tad and/or python. We're behind a corporate proxy.

inigohidalgo avatar Mar 11 '24 14:03 inigohidalgo

I'm seeing this error when opening the file:

missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)

Looking at your schema while debugging in our code, all of the DECIMAL fields seem to be backed by FIXED_LEN_BYTE_ARRAY primitives rather than INT64 as indicated above by PowerBI. Do you know for sure the DECIMAL fields use INT64? If that's true, there may be a bug in the Parquet library we're using...

craxal avatar Mar 11 '24 21:03 craxal

This reproduces for me in both 1.32.0 and 1.32.1, so I'm not sure if this is really a regression.

craxal avatar Mar 12 '24 00:03 craxal

I'm not sure if this is really a regression.

I can't explain it. I have been running 1.32.1 since release, and this issue did not occur. I check these files almost daily using Storage Explorer. 1.33.0 was the first encounter with this issue. I now have to run 1.31.2 to preview this (daily) parquet file.

I can only assume that perhaps something in the upgrade vs. complete reinstall is different.

missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)

All I can say on this, is that the files are created by Azure Data Factory and before the SINK I'm converting to either string or decimal which is correctly picked up in the SINK.

image

damianvandoom avatar Mar 12 '24 09:03 damianvandoom

@craxal I was able/not able to preview the file @lab44Hub sent in versions:

  • 1.30.2: yes
  • 1.31.2: yes
  • 1.32.1: no
  • 1.33.0: no

Can we look into what the regression was between 31 and 32?

MRayermannMSFT avatar Mar 12 '24 16:03 MRayermannMSFT

I can also confirm that the file opens in 1.31.2 without issue.

The only significant change I can see between 1.31.2 and 1.32.0 is we patch-updated the Parquet library from 1.3.3 to 1.3.4 (interestingly enough, to solve problems with DECIMAL precision, see #7042, LibertyDSNP/parquetjs#91). The issue no longer occurs if I revert to 1.3.3. Unfortunately, the problem persists even with the current version (1.6.0).

@JasonYeMSFT From what I'm gathering, it looks like the 1.3.4 version included a contribution you made to support DECIMAL precisions higher than 18. Can you provide any insights or point to potential solutions?

craxal avatar Mar 12 '24 17:03 craxal

I can see why it is broken. The library's typing suggests that the values of the metadata fields are either of some type or "undefined". However, this customer's parquet file uses "undefined" and "null" interchangeably. I will make a PR to the upstream library but this will be a long term pain because as long as the typing are still misleading.

JasonYeMSFT avatar Mar 12 '24 19:03 JasonYeMSFT

I opened a PR in the upstream library https://github.com/LibertyDSNP/parquetjs/pull/122.

JasonYeMSFT avatar Mar 12 '24 21:03 JasonYeMSFT

Looks like I was a bit mistaken. The contribution mentioned above actually fixes the issue, but the fix was not present until 1.6.1, which was just released. I've updated our library to that version, and I can confirm the file now parses and renders in the preview as before.

@lab44Hub We will have this fixed for 1.34.0. However, if other issues arise that warrant a 1.33.1 hotfix, we will make sure this fix is included in the hotfix.

craxal avatar Mar 14 '24 17:03 craxal

Awesome response, thanks everyone.

damianvandoom avatar Mar 14 '24 17:03 damianvandoom