langpacks icon indicating copy to clipboard operation
langpacks copied to clipboard

Creating a DataRobot Java CodeGen Template

Open zeryx opened this issue 2 years ago • 2 comments

AML-10 - Datarobot Java CodeGen

We're creating a simple template that utilizes a publicly available model file that uses.

Checklist

  • [x] My PR title includes a relevant Jira ticket name
  • [x] If I made configuration changes, they are in the config template in the deploy directory
  • [x] I have added unit tests where appropriate
  • [x] I have added integration tests where appropriate
  • [x] Manual tests are required in a cluster

zeryx avatar Nov 12 '21 17:11 zeryx

@aslisabanci I think both approaches can work, and the subprocess can be simpler - but it runs into performance issues as it shuts down and starts back up the JVM, and requires reloading the model on each call. This has been something most of our DR customers have asked for is improved performance, which is why we utilise the py4j system. If there's an easier way to manage the apply function and simplify the IO that would be awesome

zeryx avatar Nov 15 '21 19:11 zeryx

@aslisabanci I think both approaches can work, and the subprocess can be simpler - but it runs into performance issues as it shuts down and starts back up the JVM, and requires reloading the model on each call. This has been something most of our DR customers have asked for is improved performance, which is why we utilise the py4j system. If there's an easier way to manage the apply function and simplify the IO that would be awesome

I don't object to the perf aspect, but sometimes the user won't know how to use the "codegen + monitoring" jar using py4j. This jar is invoked with certain parameters and since we don't know the internals of this Java package's implementation, so we won't know how to write the py4j wrapper. Taking my example in this thread, I don't know how I could call this jar package with these parameters using py4j:

java -jar <local path to scoring code jar> csv \
  --input=<local path to input CSV> \
  --output=<local path to output CSV> \
  --enable_mlops \
  --dr_token=<your api token>

So for me, it wasn't a matter of preference, but a necessity to call this jar using subprocess.

We can keep the template like this if it fits the majority of our interested users' way of using the codegens, but I wanted us to be aware of the other use cases.

aslisabanci avatar Nov 15 '21 20:11 aslisabanci