clerk icon indicating copy to clipboard operation
clerk copied to clipboard

nippy doesn't trust Java object passed as argument to function

Open mars0i opened this issue 3 years ago • 9 comments

I'm not sure whether this is a nippy issue or a clerk issue. I decided to start here, but please feel free to let me know that should submit the issue at the nippy repo instead.

I'm using an Apache Commons Java class to generate data for a hanami/vega-lite plot. It works fine if I construct the data with a series of defs. But if I wrap up the process in a function, and pass the Java object as an argument, nippy doesn't like that and turns the Java object into an error map.

The example below should work as given, but if you comment out the expression in the the hanami :DATA line and uncomment the expression after that, you should see the error, with the nippy error map printed to stdout. The issue doesn't have to do with hanami--I can get the same effect by trying to print the map that the function is supposed to return--but I'm providing the hanami example for context.

I went through the little bit of docs I found at nippy repo looking for info on how to tell it to trust what I'm giving it. I didn't see a way to configure that. (Is there more documentation somewhere?) I don't know about the dangers that nippy is trying to protect against, but I can easily imagine that they exist. However, in this case nippy is interfering with a very normal Clojure practice.

Thanks!

Add to project.clj (or translate it as you wish for deps.edn, though I haven't tried that):

[org.apache.commons/commons-math3 "3.6.1"]
[aerial.hanami "0.17.0"]

Here's the MWE:

(ns mwe
    (:require 
      [nextjournal.clerk :as clerk]
      [aerial.hanami.common :as hc]
      [aerial.hanami.templates :as ht])
    (:import
      [org.apache.commons.math3.distribution LevyDistribution]))

(defn add-xy-labels
  [xs ys label]
  (map (fn [x y] {"x" x, "y" y, "label" label})
    xs ys))

(def scale 50)
(def dist (LevyDistribution. 0 scale))
(def label (str "scale=" scale))
(def xs (range 1 101))
(def ys (map (fn [x] (.density dist x)) xs)) ; y's are probabilities of x

;; I want to do the above a lot, so why not automate it?
(defn dist-plot-data
  [dist scale num-points]
  ; clerk is passing a nippy error map instead of LevyDistribution:
  (println "\ndist is a" (class dist) "\n" dist)
  (let [xs (range 1 (inc num-points))]
    (add-xy-labels xs (map (fn [x] (.density dist x)) xs)
                   (str "scale=" scale))))

;; No error unless I use the result:
(dist-plot-data dist scale 100)

(clerk/vl 
  (hc/xform ht/line-chart
            :DATA (add-xy-labels xs ys (str "scale=" scale)) ; works
                  ;(dist-plot-data dist scale 100) ; fails
            :COLOR "label"
            :YTITLE "p(x)"
            :TITLE "Lévy distribution"
            :HEIGHT 400))

mars0i avatar Jan 02 '22 18:01 mars0i

A small update: I thought that maybe the problem was that I was passing an external Java class instance to my function dist-plot-data, and that if I wrapped the instance and the call to its .density method in a Clojure function, it might work. i.e. I tried passing that new function that just gets the probability density of a value x to a revised dist-plot-data function. The same problem occurs, but now it's the new get-density function that sees the nippy error map where it should see the LevyDistribution object.

mars0i avatar Jan 03 '22 02:01 mars0i

Removing .cache got rid of the problem, at least for the present. (Maybe this is a normal thing to do, but I'm pretty new with clerk--didn't know.) So I don't know whether there is a real issue here or not.

mars0i avatar Jan 03 '22 03:01 mars0i

Sorry to keep leaving comments. I don't mean to imply that I expect a quick response. I just want to add that though I thought that being able to clear the cache might mean that this would just be an occasional annoyance, it turns out that I'm having to flush the cache and restart my repl multiple times per work session.

mars0i avatar Jan 04 '22 17:01 mars0i

Hi @mars0i, no worries at all. I have a few things I'd like to wrap up and I'll take a closer look at your issue then.

mk avatar Jan 05 '22 10:01 mk

I started looking at this further. Just found the nippy docs, and this page looks relevant: http://ptaoussanis.github.io/nippy/taoensso.nippy.html

I'm not yet sure how to use the info there.

This is what the error maps look like. I should have included that before:

{:nippy/unthawable {:cause :quarantined :class-name "org.apache.commons.math3.random.Well19937c" :content #object["[B" 0x5a24f0bc "[B@5a24f0bc"] :type :serializable}}

I thought I'd pass this on so that the info is available whenever you get to the issue. Or maybe I'll figure out more later.

mars0i avatar Jan 07 '22 04:01 mars0i

OK--I have made progress. The problem seems to be due to the fact that I'm using classes from Java libs that are not listed in taoensso.nippy/*thaw-serializable-allowlist*, because they're not in default-thaw-serializable-allowlist; these are defined in taoensso.nippy/nippy.clj. Arbitrary Java libs aren't allowed to be thawed because of the potential for a security risk--not sure I understand that, but I believe it. (I'm not sure why clerk works for a short time, and then has problems with nippy thawing. No doubt it has to do with when thawing is necessary.)

I'm using some Apache Commons libs and another Java lib (from an author I trust). Those are the ones that are appearing in the nippy error maps.

It looks like I can fix the behavior by requireing nippy, and then executing, for example:

(alter-var-root (var nippy/*thaw-serializable-allowlist*) conj "org.apache.commons.*")

That specifies that all org.apache.commons libs are safe to thaw.

I'll do more experiments, but I think that fixes the problem.

(It seemed as if doing this caused clerk to become very slow; I'll have to do more experimentation, though.)

I'll submit a PR to the nippy repo to add the Apache Commons libs to default-thaw-serializable-allowlist. The other lib I'm using is more obscure--not sure whether it will seem acceptable to add it to nippy's default thaw list.

I think this is an issue that will come up again. (It could come up with any library that uses nippy.) clerk users won't necessarily know that clerk uses nippy (or what nippy is, even), or know what to do if they start getting weird behavior due to nippy's thaw security policy. So maybe it would make sense for the clerk docs to mention that this can happen. Maybe it would even be worth adding a function to clerk that will allow a user perform the alter-var-root on *thaw-serializable-allowlist*. Not sure. That would be slightly more convenient, since it wouldn't require the user adding nippy as an explicit dependency. (I'd be happy to submit a small PR that adds such a function, if that makes sense.) There might be other issues here I don't know about, or maybe there's a much better solution.

mars0i avatar Jan 07 '22 05:01 mars0i

Still investigating this. My solution usually works, but there are some conditions when it doesn't. Trying to pin down what they are.

mars0i avatar Jan 12 '22 20:01 mars0i

Try evaluating those forms when things don't work: https://github.com/nextjournal/clerk/blob/9206957ab8b18d5d68a9ee349971bf60955493ef/src/nextjournal/clerk.clj#L17-L20

The following places should have more pointers:

  • https://ptaoussanis.github.io/nippy/taoensso.nippy.html#var-freeze-serializable-allowlist
  • https://ptaoussanis.github.io/nippy/taoensso.nippy.html#var-allow-and-record-any-serializable-class-unsafe
  • https://github.com/ptaoussanis/nippy/issues/130

mk avatar Jan 12 '22 21:01 mk

@mk Great--thanks. I'll try that.

mars0i avatar Jan 12 '22 21:01 mars0i