Chris Nuernberger
Chris Nuernberger
Here is the deps.edn file with the jvm opts I used: ```clojure {:paths ["src" "resources"] :deps {org.clojure/clojure {:mvn/version "1.10.3"} clj-python/libpython-clj {:mvn/version "2.018"}} :aliases {:manual-gil {:jvm-opts ["-Dlibpython_clj.manual_gil=true"]}}} ```
That is fascinating. I will follow up on this soon.
Ouch, yes, this would suck to debug and would not be obvious at all. We can work a check for that into the basic setup of the library - the...
Hmm, then does that correspond to how we manipulate columns? Do we need a '+' operator that simply creates a node in a DAG and then performs it on demand?...
Hmm, so load enough of the file to know the schema and then stop at that point? Or assume the first file's schema applies to the rest of the files?...
If you can know your pipeline a-priori you can do a lot of optimizations by combing steps or reorganization operations. The Spark Dataframe system has these types of optimizations at...
You would need a more efficient version of `ds/concat`. Specifically it concats data at the reader level incurring a cost at each index access. If, for instance, the definition of...
If you have the atomic operations right then some of that can be done by the programmer for sure. It isn't necessary to start there.
Or rather, perhaps it is easier to start there? What if your dataset definition is a schema and a list of files and then you have a function that, given...
And potentially only somewhat harder to construct them.