clojisr icon indicating copy to clipboard operation
clojisr copied to clipboard

require-r not working inside a function

Open awb99 opened this issue 4 years ago • 31 comments

(defn init-r []
  (println "configuring clojisr ..") 
  (require-r '[base :as base :refer [$ <- $<-]]
                  '[utils :as u]
                  '[stats :as stats]
                  '[graphics :as g]
                  '[datasets :refer :all])
    (base/options :width 120 :digits 7)
    (base/set-seed 11228899)
    (pdf-off))

(init-r)

When I have all the forms of this function direct in the namespace, then all works fine. But if I wrap the initialization in a function to be more modular, then the r libraries in require-r can no longer be found.

I am not sure if clj require works the same way or not.

awb99 avatar Jun 01 '20 19:06 awb99

require-r is regular function and should work, what error do you get?

genmeblog avatar Jun 01 '20 19:06 genmeblog

It is working. But then later below I get errors that the namespaces are not defined. The example is in: https://github.com/pink-gorilla/goldly/blob/master/profiles/demo/src/systems/r_telephone.clj

When I had it in a wrapper function, it would not work.

awb99 avatar Jun 01 '20 21:06 awb99

Maybe something with intern function which is used to create namespaces and functions?

genmeblog avatar Jun 01 '20 21:06 genmeblog

I have no idea... Sorry. I just know that the code only works when is not in a function. Also I think that I had to defined R variables via let. When I was using (defs macro) as a standalone basis, then my functions were also not seeing the R variables.. But it might be that this was an error in the defs macro. I dont know yet.

(defmacro defs
    [& bindings]
    {:pre [(even? (count bindings))]}
    `(do
       ~@(for [[sym init] (partition 2 bindings)]
           `(def ~sym ~init))))

awb99 avatar Jun 02 '20 03:06 awb99

I will test it soon. require-r should work from a function. defs macro should also work. Can you give me example of defs usage?

genmeblog avatar Jun 02 '20 08:06 genmeblog

Ok, the main issue with you function is that base package is not available during function compilation. It's created in require-r. So function this function cannot work and can't even compile.

Below works on my setup

(defn init-r []
  (println "configuring clojisr ..") 
  (require-r '[base :as base :refer [$ <- $<-]]
             '[utils :as u]
             '[stats :as stats]
             '[graphics :as g]
             '[datasets :refer :all]))

(init-r)

(base/options :width 120 :digits 7)
;; => $width
;;    [1] 120
;;    $digits
;;    [1] 7

(base/set-seed 11228899)
;; => NULL

genmeblog avatar Jun 04 '20 18:06 genmeblog

I have an idea: Example:

(ns (:require [r.base]))
(base/options {})

What will happen here is that clojure analyzer will load the namespace and get the fuctions in it. So there are two ways really:

  1. Similar to Shadow-cljs that creates clojurescript Namespaces from package.json you add a r-clojsir.edn file that defines the r-dependencies of the project. You then could make a r-module-generator that will create a file for each library that contains all information that is available in a namespace. You then can use this generated data to create the namespace.

  2. You skip #1 and hook into the function resolver; and at compile time you resolve everything. So Say (r.base/options {}) resolves but also (r.base/typo-not-existing {}) would resolve. At execution time you then link it to the session.. and if you cannot establish a binding at that time, then you throw an exception.

awb99 avatar Jun 04 '20 19:06 awb99

I am sorry for being a dick here. But how you have the syntax now makes it very difficult to write helper functions / libraries that use R functions.

awb99 avatar Jun 04 '20 19:06 awb99

Ok, I leave it open. Yes, it's hard to write external helper functions since symbols are not available before requiring them in the live session.

Maybe something like cljsjs can be helpful here? But someone needs to maintain it. Also I'm not sure about different backends and differences between packages (renjin vs R)

genmeblog avatar Jun 04 '20 19:06 genmeblog

I tested around a little bit. So require is tighty coupled with jars. No way to use require without generating a jar first. I think in the long run, this is the best option. You generate a jar file, and in the jar file you create functions/variables corresponding to the module you have in R.

For the short term, I think this might work: https://github.com/pink-junkjard/integrator/blob/master/src/demo/app.clj https://github.com/pink-junkjard/integrator/blob/master/src/integrator/core.clj

So you essentially do not use requires at all, and define the ns completely dynamic. So you would adapt integrator.core to read some edn fie that you generate when you discover an r module.

And then you can write a require-r macro that just redefines the functions you want to have extracted. You might also use potemkin for this: https://github.com/ztellman/potemkin/blob/master/src/potemkin/namespaces.clj

Then all you need is one dynamic variable that is linked to the session, and then essentially this variable is used in the generated functions.

awb99 avatar Jun 04 '20 21:06 awb99

lein run

starting..
initializating R..
calculating sin of  3.14
done!

awb99 avatar Jun 04 '20 21:06 awb99

starting.. initializating R.. calculating sin of 3.14 done!

awb99 avatar Jun 04 '20 21:06 awb99

defs usage.

(defs a 1
      b 2)

this is identical to

(def a 1)
(def b 2)

awb99 avatar Jun 04 '20 21:06 awb99

I would NOT do something like cljsjs. cljsjs is a thing of the past. shadow-cljs solved the poblem of npm dependencies and externals.

r-deps.edn

{:engine :rserv
 :deps [base math dplyr]} 

Then you call

clojisr .

clojisr is an executeable (or a lein plugin, does not matter) clojisr then generates the file target/r-modules.edn. It does so by starting rserv, and loading/exploring all modules that were specified. Then it exits. So this is a compile step similar to "cljsbuild once" or to a css preprocessor. target/r-modules.edn then contains all the data that was discovered.

then you adapt integrant.core to read target/r-modules.edn; and this means you have completely normal clojure namespaces for all the r fuctiobs that you want.

You just need to

awb99 avatar Jun 04 '20 21:06 awb99

You can also do the interpreter approach that you do now; so it you have he startup cost at each startup. But you would have the huge advantage, that r functions are now completely normal clojure functions.

docstrings -> done!

awb99 avatar Jun 04 '20 21:06 awb99

Thanks for the idea. I believe that when we decouple session from robjects all of this will be possible. What I see is that packages can differ between R version (R 3.x, R 4.x) so I don't think providing dummy namespaces (or edn) in library by us is an option.

I need to rethink it, but generally the options we discuss are:

  • (current) load package symbols from live R session
  • (proposed) load package symbols from edn file (which has to be generated by live session).

The only difference I see is "live R session", right?

genmeblog avatar Jun 04 '20 23:06 genmeblog

You are absolutely right! It definitely does not make sense to you inside the library will not provide a fixed module definition edn. Instead clojisr generates this on the users machine, with the r setup supplied by the user.

However, this can be done BUILD TIME (say via a script that the can run user run,... ), or it can be run AT EACH APP START. Now shadow-cljs does this at build time. Currently you do it at APP start.

Irrespective of BUILD TIME vs APP START TIME, the idea with in-ns,.. make sense. The more clojisr functions behave like normal clojure functions, the less of a integration problem ...

awb99 avatar Jun 05 '20 05:06 awb99

With my proposal, the syntax would change from:

(defn load-quakes-r [rmin rmax]
  (-> 'quakes
      (r.dplyr/filter `(& (>= mag ~rmin)
                          (<= mag ~rmax)))))

to this:

(defn load-quakes-r [rmin rmax]
  (-> quakes
      (r.dplyr/filter (& (>= mag rmin)
                          (<= mag rmax)))))

quakes is a clojure def symbol, and refers RObject. r.dplyr/filter is a clojure function (that you added via create-ns ).

(ns clojisr.executor)
(def dynamic *session*)

(defn r-fun-exec [r-fun-name & args]
  (let [args-robject (map #(if (robject? %) % (c->robject)) args))]
    (send-to-r :exec r-fun-name args-robject)))
; this ns is auto generated (user would still not see the code as all is done in code)
(ns r.math)

(defn sum [a b]
   (r-fun-exec base/sum a b))

awb99 avatar Jun 05 '20 08:06 awb99

It's not as easy as it looks. There is stuff which often operate on symbolic level (like formulas). Also if you want to create R function from Clojure you have to make it on symbolic level. Also there are functions which contain forbiden symbols, also data types are different (named lists are not maps) etc.

So removing symbolic calls removes part of the possible functionality. Don't forget that actuall R call is done by passing properly formatted string.

I agree that dummy handlers for R functions or values can be generated to enable clojure compilation without connection to R, but removing symbolic call is not possible.

From your example

(defn load-quakes-r [rmin rmax]
  (-> 'quakes
      (r.dplyr/filter `(& (>= mag ~rmin)
                          (<= mag ~rmax)))))

Above code will be converted to a string and evaluated on the R side fully (dplyr/filter expects symbolic predicate and this predicate will be evaluated within quake context)

But this:

(defn load-quakes-r [rmin rmax]
  (-> quakes
      (r.dplyr/filter (& (>= mag rmin)
                          (<= mag rmax)))))

Will not work. First mag is unknown. Also, Clojure first will evaluate & which will fail.

This is how R works. Plenty of functions treat parameters as symbols and delay execution until needed.

genmeblog avatar Jun 05 '20 11:06 genmeblog

The forbidden Symbol problematic: totally agreed! There needs to be some kind of escaping. If I remember correctly this happens in cljs -> ks compilation also. (defn init! [] ...) Becomes init_bang (or similar)

awb99 avatar Jun 05 '20 14:06 awb99

We escape using tick or backtick.

genmeblog avatar Jun 05 '20 14:06 genmeblog

(send-to-r :exec r-fun-name args-robject) For Rserv in my example the send-to-r function will generate the following that is sent to R:

"
P123 <- function (min max)
    function [mag]
         (Mag >= min) and (mag <= max)
dplyr/filter (quakes P123)
"

The example is a little difficult because it uses an Anonymous function as predicate. I have not yet executed the code I copied. @daslu did translate it for me. This is what I actually use in clojure only:

(defn p-mag [rmin rmax]
  (fn [{:keys [mag]}]
    (and (>= mag rmin) (<= mag rmax))))
(filter p-mag quakes-clj)

So the difficult part in this example is, that in my example it needs to generate and define in R an anonymous function (the predicate). And the definition of the function will have to be implemented ad a macro im Clojure. Because forms inside a function may not generate one wrapped r function for each form, and instead make the forms function calls inside the body of the r function Definition.

awb99 avatar Jun 05 '20 14:06 awb99

Sorry for my most likely not precise R code. I am not an R expert. So most likely my R code is not correct. But I hope you get the idea.

awb99 avatar Jun 05 '20 14:06 awb99

I think I'm missing something. How the last example is going to produce given above string?

genmeblog avatar Jun 05 '20 14:06 genmeblog

dplyr/filter doesn't take anonymous function, it takes symbolic predicate: https://dplyr.tidyverse.org/reference/filter.html

Did you test your code and is it working?

genmeblog avatar Jun 05 '20 14:06 genmeblog

I didn't test. I think you are right, and it will not work. As I said above.. I only tested the clj code I pasted above.

awb99 avatar Jun 05 '20 15:06 awb99

We had a discussion with @daslu about using symbols (tick/backtics) or maybe use macros to mimic functional behaviour. And I agreed that parsing symbolic forms by clojisr instead of providing set of macros will be more convenient since it happens in runtime. So we want to stay with this method (unless we find something more useful), let me repeat: R treats code as symbols in arguments too often to cover all the cases by functions.

genmeblog avatar Jun 05 '20 15:06 genmeblog

In goldly I am using sci to compile short cljs functions to js. I think clojisr might want to use a similar compiler architecture that defines a few control functions (like if) in order to generate the string command to R.

awb99 avatar Jun 05 '20 15:06 awb99

To not loose track of the main point here: the main usecace of clojisr Will be to call R functions. A simple function call, say (+ 1 2) would need to essentially check if it needs to convert each parameter to R, or use the R variable name if it is already known. When the R function had been mapped to a clojure function, then the rest becomes easy.

awb99 avatar Jun 05 '20 15:06 awb99

But it's done: (r+ 1 2) does it exactly as you described.

I don't agree that main use-case is calling a functions but also accessing the vars with different types (like S4 objects, data.frames, matrices), creating R functions and R datatypes.

When you import a package - respective Clojure symbols are created, When called on the function position they act as functions. If you provide primitive parameters there are converted respectively. (r.base/mean [1 2 3 4]) produces what is expected: mean(c(1,2,3,4)).

I have a feeling that we circle around the problem which I don't understand probably.

We agreed that r.base/mean should be created upfront to allow compiling code without the R backend. It's doable and we plan to implement it after session management refactor.

genmeblog avatar Jun 05 '20 15:06 genmeblog