spark
spark copied to clipboard
[BUG]: .Net Interactive: UDF use deprecated BinaryFormatter, fail in VS Code
Describe the bug UDFs are serialized using BinaryFormatter, which is deprecated for security reasons.
In environments where BinaryFormatter is already disabled, this causes UDFs to fail with a NotSupportedException.
To Reproduce
Steps to reproduce the behavior:
- Use Visual Studio Code with the .Net Interactive Notebooks extension
- Follow the steps in UDFs in .NET Interactive
- Executing cell #3 (which contains the definition of the UDF) fails with:
Error: System.NotSupportedException: BinaryFormatter serialization and deserialization are disabled within this application. See https://aka.ms/binaryformatter for more information.
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph)
at Microsoft.Spark.Utils.CommandSerDe.Serialize(Delegate func, SerializedMode deserializerMode, SerializedMode serializerMode)
at Microsoft.Spark.Sql.Functions.CreateUdf(String name, Delegate execute, PythonEvalType evalType, String returnType)
at Microsoft.Spark.Sql.Functions.CreateUdf[TResult](String name, Delegate execute, PythonEvalType evalType)
at Microsoft.Spark.Sql.Functions.CreateUdf[TResult](String name, Delegate execute)
at Microsoft.Spark.Sql.Functions.Udf[T,TResult](Func`2 udf)
at Submission#11.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)
Expected behavior
- Should be able to use UDFs
- The example in UDFs in .NET Interactive should work
- UDFs should not use
BinaryFormatter
Screenshots NA
Desktop (please complete the following information):
- OS: macOS 10.15.7
- .NET 5 (5.0.100)
- Apache Spark 3.0.1
- .NET for Apache Spark v1.0.0 (netcoreapp3.1, macOS)
- VS Code:
- Version: 1.53.0-insider
- Commit: 96b426ef1da0f0a51ce9ef1931cd1fd2322547e4
- Date: 2020-12-14T05:49:45.285Z
- Electron: 11.0.3
- Chrome: 87.0.4280.67
- Node.js: 12.18.3
- V8: 8.7.220.25-electron.0
- OS: Darwin x64 19.6.0
- .Net Interactive Notebooks v1.0.160901
Additional context NA
The same example also fails in Jupyter, but for a different reason: #796
Thanks @oefe for reporting this. We will move away from BinaryFormatter from upcoming releases.
cc @suhsteve @jonsequitur (FYI)
Not sure what the dotnet plan for replacing BinaryFormatter here is, but I was wanting to use net spark in F# notebooks (where UDFs have never been supported), plus I had a couple of other similar problems that all looked solvable if dotnet had something like pythons cloudpickle. So I wrote one, https://github.com/Ibasa/Pikala. Forked spark to use it and can now run F# UDFs from notebooks.
We will move away from BinaryFormatter from upcoming releases.
@imback82 Is there a solution to this, now is 2023?
@GeorgeS2019 @imback82
Interestingly if you target .net standard we don't see the compiler errors: netstandard2.0;netstandard2.1 https://learn.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-2-1#select-net-standard-version
I think there would still be runtime errors in a project built for .net 8.
The bigger issues is that binaryformatter is bad practice from a security standpoint (and moreover it won't allow AOT compilation of the worker in .net 8!)
UPDATE 4/16:
As I mentioned, you can get past compiler errors simply by building a project in a way that targets .net standard. This will still be compatible with .net 8. See a discussion about compatibility here: https://github.com/dotnet/runtime/discussions/90829
And you can also get past any runtime errors on .Net 8 simply by using these settings in the project file:
<NoWarn>$(NoWarn);SYSLIB0011</NoWarn>
<EnableUnsafeBinaryFormatterSerialization>true</EnableUnsafeBinaryFormatterSerialization>
... they can be introduced in .net 8 project files - of our drivers, and of the Microsoft.Spark.Worker as well.
I understand that the binary deserialization is a security threat ... but it seems a bit overblown. It is mitigated when the spark cluster is hosted entirely on a private network and the deserialization is only happening between drivers and executors. In any case, these deserialization-related concerns must be weighed against the needs of spark. There are few other environments that do as much serialization/deserialization as spark does!
I see that the "kryo" serialization also has had well-known vulnerabilities that never stopped it from being used in the Spark ecosystem:
Links regarding kryo: https://stackoverflow.com/questions/40261987/when-to-use-kryo-serialization-in-spark https://www.contrastsecurity.com/security-influencers/serialization-must-die-act-1-kryo-serialization
... I think the ultimate answer is to respect the "vulnerabilities" of binary serialization, and to disable that type of serialization by default. In practice users should be allowed to re-enable the functionality for the sake of convenience and performance. It is a measured risk, that should be left to the application developers.
@dbeavon
From ChatGPT4 =>
The BinaryFormatter in .NET has indeed been marked as obsolete due to security concerns. Here are the key points:
-
Obsolete Methods:
- The following methods are now obsolete and produce a compile-time warning with ID SYSLIB0011:
BinaryFormatter.SerializeBinaryFormatter.DeserializeFormatter.Serialize (Stream, Object)Formatter.Deserialize (Stream)IFormatter.Serialize (Stream, Object)IFormatter.Deserialize (Stream)
- In .NET 7, these methods were marked as error.
- The following methods are now obsolete and produce a compile-time warning with ID SYSLIB0011:
-
Recommended Action:
- Stop using BinaryFormatter in your code.
- Instead, consider using JsonSerializer or XmlSerializer.
- If you temporarily need to suppress the BinaryFormatter compile-time warning, you can use #pragma directives around the individual call site or suppress it in the project file¹.
-
Project Types Affected:
- In .NET 7, the methods still functioned properly in most project types (excluding ASP.NET, WASM, and MAUI) even after being marked obsolete⁴.
- However, it's strongly recommended not to continue using BinaryFormatter due to security risks¹.
ChatGPT seems pretty wrong on some points. The thing I wanted to point out is that BinaryFormatter seems to be allowed in .Net 8... if nothing else that is for the sake of compatibility with .NetStandard library dependencies. The approach to getting this working is very similar to .Net 7 (use EnableUnsafeBinaryFormatterSerialization, and disable SYSLIB0011 warnings).
The fact that we are dealing with a JVM-based (linux-based) platform means we can give ourselves more flexibility than a typical .Net solution. I would agree that we need to avoid BinaryFormatter by default, but it should probably allowed based on the discretion of users. In any case, I believe that the other serialization libraries which Spark/Jvm uses would also have the security concerns that apply to BinaryFormatter.