spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

[BUG] Excel File with Macros Detected as "Potentially" Malicious. Unable to read Excel as a result.

Open nova-jj opened this issue 2 years ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Within an Azure Databricks Environment we're using this library to read Excel files stored in a Storage Account accessed using either the ABFSS or DBFS protocols, suggesting this is a file issue and not a protocol issue. . Attempting to read the file with newer versions of the spark-excel library result in the following error caused by macros in the workbook: crealytics excel workbook java.io.IOException: The file appears to be potentially malicious. "This file embeds more internal file entries than expected."

We have reverted to a previous version that does not present this error and are looking for a solution that allows us to bypass the macro detection in our workbook which does contain macros, but are required as part of the workbook.

Expected Behavior

Reading the file into a dataframe should not be met with this error, OR, an option to override the macro detection in order to be able to force-read when "potentially" maliciousness is present.

Steps To Reproduce

The following python code produces our error:

file_path= "dbfs:/FileStore/our_excel_file.xlsm"
df = spark.read.format("com.crealytics.spark.excel").option("header", "true").load(file_path)
df = df.toPandas()

Environment

- Spark version: 3.4.1 via Databricks Runtime 13.3
- Spark-Excel version: 3.5.0_0.20.3
- OS: Windows but remote-run from Databricks clusters
- Cluster environment: Multiple cluster configurations representing dev/stg/prd using the same Databricks Runtime and Spark Versions.

Anything else?

We have reverted to using the previous version maven coordinates: com.crealytics:spark-excel_2.12:0.13.7 for our install which does not produce this issue.

nova-jj avatar Feb 22 '24 19:02 nova-jj

spark-excel doesn't do anything in that regard. It must be an upstream library that performs this check. Can you try to find out if this comes from POI?

nightscape avatar Feb 25 '24 20:02 nightscape