mist
mist copied to clipboard
Docs - Multiple Classes in Jar, Custom Encoder, Package Class, Resubmit Conf, Debug and Absolute name of artifact & function.
- How to add multiple functions from same jar. The packaged jar has multiple classes. How to write conf for multiple functions pointing to the respective classes.
- How to write a custom encoder for case class which can is used in MistFn[CaseClass].
// Sample case class
case class CorrelationMatrix(headers: Array[String], values:Array[Array[Double]])
object CorrelationMatrix extends MistFn[CorrelationMatrix] {
...
}
- How to add a class which has a package in
class-name
. The class has package likeio.hydrosphere
. Addingclass-name = "io.hydrosphere.CorrelationMatrix$"
doesn't work. - How to re-submit the function after code changes.
After submitting the conf, there are few more code changes. If we submit the conf again,
Error: Artifact key xxx.jar has to be unique
. How to overwrite the artifact without manually deleting thedata/artifacts/xxx.jar
anddata/functions/yyy.conf
. - How to debug the spark job code.
- How to prevent current user getting prefixed to artifact and function.
Thanks for questions, they will help us to improve our documentation. For a start, I try to answer here
-
Multiple functions If your question was about mist-cli configuration, then you just need to create a conf file that points on class-name for each function that you want to deploy. For example for two functions
A
andB
there should be two files:-
a.conf
model = Function name = a data { path = my_jar_0.0.1.jar class-name = "A$" context = default }
-
b.conf
:model = Function name = b data { path = my_jar_0.0.1.jar class-name = "B$" context = default }
-
-
Custom encoders We are going to add encoder derivation for cases classes in future releases, so currently there are no other ways except to write it manually:
import mist.api._ import mist.api.Encoder import mist.api.data._ case class MyResponse(x: Int, y: String) object MyResponse { implicit val myResponseEncoder = new Encoder[MyResponse] { override def aply(rsp: MyResponse): JsLikeData = { JsLikeMap("x" -> JsLikeNumber(a.x), "s" -> JsLikeString(a.s)) } } } object MyFn extends MistFn[MyResponse] { .. }
-
Package - I can't reproduce that problem. Are you sure that package you specified is correct and exists in jar?
-
Updating artifact: You can use
mist-cli apply -f conf --validate true
. But keep in mind: that action can affect in-progress functions. Also, there is an issue about artifact refreshing on workers #437, so if you useshared
context type you need to manually stop worker to apply changes or useexclusive
. -
Every job has logs - you can use them for debugging. In RC14 we improved them - now mist collects logs from spark too. There is also
withMistExtras
directive to obtain a logger inside function body -
Passing empty
-u
argument should work:mist-cli apply -f conf -u ''
. I think we should reconsider the default behavior of building names inmist-cli
. @blvp, do you have any thoughts?
Also, we have gitter room for questions.
@dos65, Thank you for the explanation. Really appreciate your time and consideration.
Package class works. The artifact wasn't refreshed when I added package to class. On restarting mist-master
it worked.
I was not able to get the updating artifact work. Using mist-1.0.0-RC13
If I run mist-cli apply -f conf --validate true -u ''
, getting error - Artifact key xxx.jar has to be unique
.
If I run mist-cli apply -f conf/correlation-matrix.conf --validate true -u ''
, getting error - Error: 400 Client Error: Bad Request for url: http://localhost:2004/v2/api/functions?force=False: class java.lang.IllegalStateException: Endpoint correlation-matrix already exists
With respect to debugging, I'm looking for a way to put breakpoint in code and debug. Similar to this.
Last error with function update was fixed in a new version of mist-cli, try to update it with following command:pip install mist-cli --upgrade
After upgrading,
mist-cli apply -f conf/correlation-matrix.conf --validate true -u ''
- Works.
mist-cli apply -f conf --validate true -u ''
- Getting same error message. Artifact key xxx.jar has to be unique
@gowravshekar
About debugging - unfortunately, there is a bug with constructing spark-submit
command (#472), so currently it's impossible to pass driver-java-options correctly, If you really need it you can implement manual
runner and add into spark submit following argument --driver-java-options '-Xdebug -Xrunjdwp:transport=dt_socket,address=15000,server=y,suspend=y'
mist-cli apply -f conf --validate true -u '' - Getting same error message. Artifact key xxx.jar has to be unique
This is normal behavior because you can break all functions using that jar.
If you want to update jar with enabled validation you should change version
config value and then change it in function. Reasons behind this are the following - apply
method used for both development and release and this limitation is kind of our vision of release process.
Some additional notes.
You can use environment variables to manage artifact version. For example: artifact.conf
model = Artifact
name = test-artifact
version = ${ARTIFACT_VERSION}
data.file-path = "./path/to/artifact.jar"
function.conf
model =
data {
...
path = test-artifact_${ARTIFACT_VERSION}.jar
...
}
and then ARTIFACT_VERSION=0.0.1 mist-cli apply -f conf/
Oh, my mistake - --validate false
instead of --validate true
for unsafe update
@gowravshekar About debugging - unfortunately, there is a bug with constructing
spark-submit
command (#472), so currently it's impossible to pass driver-java-options correctly, If you really need it you can implementmanual
runner and add into spark submit following argument--driver-java-options '-Xdebug -Xrunjdwp:transport=dt_socket,address=15000,server=y,suspend=y'
Does this bug still exist? Is there a way now to debug the spark job?
@apoorv22 this one is fixed, you can use these options to debug spark job. Also, you need to be aware of the following things:
- your context should have
precreated=true
andmaxParallelJobs=1
settings. Otherwise, it will be problematical to start several workers and connect a debugger to the desired process. - breakpoints should suspend the current thread only, not all VM. When mist-master loses heartbeats from worker-process it marks it as failed. For example, by default IntelliJ sets breakpoints that suspend VM fully.
@blvp, Is there a way to use an environmental variable or config to use in data.file-path
in artifact.conf
?
Some thing similar as below: data.file-path = "./path/to/artifact_${ARTIFACT_VERSION}.jar"
Yes, you can use environment variable here in a similar manner: data.file-path="simple-name"${VERSION}".jar"