SynapseML
SynapseML copied to clipboard
chore: Update isolation-forest to 3.0.1
In preparation of the Scala 2.13 update.
Related Issues/PRs
The Scala 2.13 build requires the new version of isolation-forest: https://github.com/microsoft/SynapseML/pull/1772 I've separated out the update into a separate PR, so that CI can verify that the dependency update itself does not break anything. There's a change to the API contained in the update which doesn't break anything in my local build, but the CI covers more.
What changes are proposed in this pull request?
To update the isolation-forest dependency to the most recent version, which has a Scala 2.13 build.
How is this patch tested?
- [x] I have run
sbt test:compilelocally to verify that the API change doesn't break Scala code.
Does this PR change any dependencies?
- [x] Yes. Make sure the dependencies are resolved correctly, and list changes here.
"com.linkedin.isolation-forest" %% "isolation-forest_3.2.0" % "2.0.8"=>"com.linkedin.isolation-forest" %% "isolation-forest_3.2.0" % "3.0.1"
Does this PR add a new feature? If so, have you added samples on website?
- [x] No. You can skip this section.
Hey @nightscape :wave:! Thank you so much for contributing to our repository :raised_hands:. Someone from SynapseML Team will be reviewing this pull request soon.
We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work!
Examples of commit messages with semantic prefixes:
fix: Fix LightGBM crashes with empty partitionsfeat: Make HTTP on Spark back-offs configurabledocs: Update Spark Serving usagebuild: Add codecov supportperf: improve LightGBM memory usagerefactor: make python code generation rely on classesstyle: Remove nulls from CNTKModeltest: Add test coverage for CNTKModel
To test your commit locally, please follow our guild on building from source. Check out the developer guide for additional guidance on testing your change.
/azp run
Azure Pipelines successfully started running 1 pipeline(s).
Codecov Report
Merging #1776 (2fb1a46) into master (8dc4a58) will decrease coverage by
0.03%. The diff coverage isn/a.
@@ Coverage Diff @@
## master #1776 +/- ##
==========================================
- Coverage 86.02% 85.98% -0.04%
==========================================
Files 278 278
Lines 14722 14722
Branches 767 767
==========================================
- Hits 12664 12659 -5
- Misses 2058 2063 +5
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...crosoft/azure/synapse/ml/io/http/HTTPClients.scala | 67.64% <0.00%> (-7.36%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@nightscape the multivariate anomaly detection notebook (which uses Isolation Forest) notebook failed on databricks with the following error:
Py4JJavaError: An error occurred while calling o1228.load.
: org.json4s.MappingException: Did not find value which can be converted into int
at org.json4s.reflect.package$.fail(package.scala:53)
at org.json4s.Extraction$.$anonfun$convert$2(Extraction.scala:881)
at scala.Option.getOrElse(Option.scala:189)
at org.json4s.Extraction$.convert(Extraction.scala:881)
at org.json4s.Extraction$.$anonfun$extract$10(Extraction.scala:456)
at org.json4s.Extraction$.$anonfun$customOrElse$1(Extraction.scala:780)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
at scala.PartialFunction$$anon$1.applyOrElse(PartialFunction.scala:257)
at org.json4s.Extraction$.customOrElse(Extraction.scala:780)
at org.json4s.Extraction$.extract(Extraction.scala:454)
at org.json4s.Extraction$.extract(Extraction.scala:56)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:22)
at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelReader.load(IsolationForestModelReadWrite.scala:52)
at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelReader.load(IsolationForestModelReadWrite.scala:38)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$5(Pipeline.scala:277)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:161)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:156)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:43)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:277)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
at org.apache.spark.ml.Pipeline$PipelineReader.$anonfun$load$2(Pipeline.scala:215)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:161)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:156)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:43)
at org.apache.spark.ml.Pipeline$PipelineReader.$anonfun$load$1(Pipeline.scala:214)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
at org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:214)
at org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:209)
at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:355)
at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:355)
at org.apache.spark.ml.Pipeline$.load(Pipeline.scala:197)
at org.apache.spark.ml.PipelineSerializer.read(Serializer.scala:129)
at org.apache.spark.ml.PipelineSerializer.read(Serializer.scala:122)
at com.microsoft.azure.synapse.ml.core.serialize.ComplexParam.load(ComplexParam.scala:24)
at org.apache.spark.ml.ComplexParamsReader$.$anonfun$getAndSetComplexParams$2(ComplexParamsSerializer.scala:176)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.apache.spark.ml.ComplexParamsReader$.getAndSetComplexParams(ComplexParamsSerializer.scala:172)
at org.apache.spark.ml.ComplexParamsReader.load(ComplexParamsSerializer.scala:155)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$5(Pipeline.scala:277)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:161)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:156)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:43)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:277)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:161)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:156)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:43)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
This is likely due to an json4s mismatch between new isolation forest and databricks
@nightscape do you have a databricks account you can use to repro or need help here?
It seems that my personal account is not able to view the logs. I'm getting the following error when I try to open one of the links provided in the CI error list: AADSTS50020: User account '[email protected]' from identity provider 'live.com' does not exist in tenant 'Microsoft' and cannot access the application '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d'(AzureDatabricks) in that tenant. The account needs to be added as an external user in the tenant first. Sign out and sign in again with a different Azure Active Directory user account.
The error seems to be that the loaded JSON does not contain the field newly introduced in isolation-forest. I haven't checked the notebook yet, does it include JSON created by the old version of isolation-forest?
@nightscape not sure where you are truying to log into. The build system is scoped to MSFT users so that might be what you are encountering. However if you have an ADB account its easy to load in the notebook and run it with the updated version. If you dont, we can figure out how to get you a repro. I'm not sure its loading an old model though.