needed-libraries icon indicating copy to clipboard operation
needed-libraries copied to clipboard

[Meta] Are we scientists yet?

Open mratsim opened this issue 6 years ago • 32 comments

This is a meta-issue to keep track of discussion around Nim scientific libraries.

Primitive libraries

Decimal128: https://github.com/JohnAD/decimal128 Fixed-point: https://gitlab.com/lbartoletti/fpn

Multidimensional arrays, Linear-algebra

Multidimensional arrays are the basic block of scientific computing, it goes beyond the 2D or 3D vectors and matrices. Notable non-Nim implementations include Fortran, Julia, Matlab and Numpy.

Status: in-progress Libraries:

Support

Arraymancer supports dense multidimensional arrays of any type, on CPU (integers, floats, complex), Cuda and OpenCL (float only) and uses BLAS, CuBLAS and Clblast under the hood.

Flambeau is provide libtorch bindings and reproduces PyTorch functionality.

Manu is a pure Nim matrix library with no external dependencies

Neo supports dense and sparse float vectors and matrices, on CPU and Cuda (Nvidia GPUs) and also uses BLAS and LAPACK under the hood.

Status: stalled Libraries:

NimTorch supports most PyTorch features regarding multidimensional arrays, on CPU, Cuda, OpenCL and AMD ROCm provided you compiled PyTorch's Aten backend with the relevant features.

Plotting

Data analysis requires plotting, notable non-Nim implementations include Python matplotlib and seaborn, Plot.ly (Python, R, Javascript), R ggplot2, Matlab and Facebook Visdom (a simple interface to Plot.ly).

Note that there are a couple of approach to plotting, either having a charting library or having a high-level grammar library (similar to SQL) that hides low-level details of a chart.

Status: in-progress Libraries:

Proof-of-concepts:

Unmaintained:

  • arraymancer-vision has a very simple interface to Facebook's Visdom here.

ggplotnim is an implementation in pure Nim of the graphics of grammar. gnuplot.nim is a wrapper of gnuplot. Nim-Plotly uses the plot.ly charting library as a backend. Both MetaPlot and Monocle uses the Vega visualization grammar.

Image processing library

Computer vision is a thriving area of research. Vision scientists needs algorithms that works on images represented as a multidimensional arrays (different from say Photoshop), preferably multithreaded and GPU accelerated.

Notable non-Nim libraries include OpenCV, Matlab, Python scikit-image, scipy.ndimage and mahotas.

Status: in-progress

Libraries:

Unmaintained:

Nim-opencv provides rough low-level bindings of OpenCV functions.

Dataframe and columnar/tabular data processing

Dataframes are essential to process structured data (say Name, Age, number of products bought, last time of visit). They allow very efficient data manipulation, including easily creating new columns, joining dataframes, converting between types.

Notable non-Nim packages include Python Pandas and R datatable. When data does not fit in RAM, dataframe packages are interfaced with SQL or HDF5 datastores or even Spark for very large scale processing.

Status: in-progress Libaries:

  • NimData provides dataframe facilities to Nim

Random library

Lots of scientific algorithms rely on stochastic processes or random distribution. At the very least pseudo-random generator that samples from a normal/gaussian distribution is needed.

Notable non-Nim library include Scipy

Status: in-progress Libraries:

  • Alea by @andreaferretti allows sampling from non-uniform distributions (Gaussian, Bernoulli, Poisson ...)
  • Standard library and Nim-random by @oprypin only allows uniform sampling.

Statistics library

Notable language: R

Status: standard lib statistics module

Machine learning

Machine learning is how to teach a computer to learn/generalize patterns from data.

Notable non-Nim libraries include: Python's Scikit-Learn and R's Caret. State-of-the-art C++ library to wrap: XGBoost

Status: in-progress

Deep learning & neural network.

Deep learning is machine learning with neural networks and arguably eating the world (or atleast Reddit, Hacker News and sponsors). In comparison to most traditional machine learning tools, neural networks can also learn very well from non-structured data (images, sounds, text ...).

Notable non-Nim libraries include: Facebook Torch, Google Tensorflow, Apache and Amazon Mxnet

Status: in-progress Libraries:

Proof-of-concept:

  • Neurotic was a proof of concept to build simple neural network on Neo/linalg

Non-linear optimization

Status: in-progress Libraries:

  • MPFIT (Non-Linear Least squares fitting)
  • NLOPT, wrapper for the nlopt library

Linear programming

Status: in-progress Libraries:

  • nim-isl, wrapper for the ISL parametric integer linear programming library

Computational Physics

Status: in-progress Libraries:

Geometry

Computational geometry also require tuned algorithms for: geometry primitives, polygons and polyhedron, triangulations, distances, shape analysis ...

Noteable non-Nim library: CGAL

Status: no library

Scientific serialization format

There are many formats specific to science ot even science domains.

Libraries:

  • nim-hdf5, wrapper for the HDF5 data format

Geospatial library

Often scientist needs to deal with geospatial coordinate (latitude, longitude), maps and distances. This include efficient data-structures like KD-Tree or RTree to compute distances between points and distance formulas like Haversine to compute distance on a sphere.

Notable non-Nim libraries include Python's scipy.spatial, Geopy, Shapely

Status: in-progress R-tree forum thread.

Proof-of-concepts:

  • GDAL wrapper (Geospatial Data Abstraction Library)

Scientific language bindings

Python:

Unmaintained

mratsim avatar Nov 17 '17 12:11 mratsim

Placeholder.

To avoid polluting this meta-thread with specific discussion on certain topics (say what I want in the random library), this will link to the discussion topics:

Multidimensional arrays, Linear-algebra

#14, #17, #25, #50, #59

Plotting

#17, #51, #70

Geospatial

#13, #69

Image processing

#69

Dataframes, columnar/tabular data processing

#20, #47, #33

Random

#40

Statistics

#16

Machine learning

#48

Deep learning

No issue open

Computational Geometry

#53

mratsim avatar Nov 17 '17 12:11 mratsim

For sampling from other distributions, there is Alea. I have to clean it up - some examples fail with the latest concept changes in devel - but I hope to make these work again soon

andreaferretti avatar Nov 17 '17 12:11 andreaferretti

This almost makes me want to buy arewescientistsyet.org ala http://www.arewewebyet.org/. Perhaps you'd be interesting in creating something like this? :)

dom96 avatar Nov 17 '17 13:11 dom96

I would also add in differential equation solvers as well as Markov chain Monte Carlo samplers...

sdwfrost avatar Nov 27 '17 17:11 sdwfrost

Over the last 2 months I've been working on high level bindings to the HDF5 library:

https://github.com/Vindaar/nimhdf5

It's still very much work in progress (also due to my limited knowledge of Nim and the more low level parts of HDF5). As a raw wrapper it should be fully functional, with the downside of the (imo not very intuitive) C API. But the high level bindings are improving slowly. There's an example (examples/h5_create_dataset_hl.nim) showing the available features.

Vindaar avatar Jan 30 '18 13:01 Vindaar

Plotting

Status: no libraries

  • https://github.com/dvolk/gnuplot.nim
  • https://github.com/stisa/graph
  • https://github.com/sdwfrost/nim-plotly-example

narimiran avatar Mar 26 '18 16:03 narimiran

By far the most important category is missing from this list I feel; and that is first-class two way python bindings.

The ability of python to easily (relatively, for the time) interface with the then-dominant languages was pivotal in its adoption in scientific computing.

Id use a ton of nim from python right away if there was a clean, boiler plate free method of sending ndarrays back and forth between the two. Last time I checked there was not, and as much as i like nim I dont see it replacing my entire python ecosystem any day soon.

In particular, I would much rather use nim than cython or numba or any such half-baked language. Boost-python has the bindings figured out pretty well but then again I can rarely justify having to deal with C++.

But a system of bindings with the convenience of boost-python but without the C++ would massively expand the usability of nim for my (and I think its not just me) scientific programmers.

Also, starting out a project in nim would be a much better proposition if i had the reassurance I could always pop up a matplotlib debug figure without any hassle.

EelcoHoogendoorn avatar May 02 '18 13:05 EelcoHoogendoorn

@EelcoHoogendoorn there are a few projects.

  • nim-pymod is not mantained and a little cumbersome in that it requires its own scripts to build, but it allows to send ndarrays back and forth
  • nimpy looks more actively mantained but I am not sure whether it supports Numpy types
  • python3 seems to be another one, but I am not sure of its status

None of these projects is fully mature at this point, but this is definitely something doable

andreaferretti avatar May 02 '18 13:05 andreaferretti

Of course it is doable; both Python and nim are Turing complete. But without having the time to put in the work to make these into feature complete mature solutions myself, it is what is stopping me from using nim at present.

The good news is that this should be a lot less work than reinventing matplotlib.

On May 2, 2018 15:29, "Andrea Ferretti" [email protected] wrote:

@EelcoHoogendoorn https://github.com/EelcoHoogendoorn there are a few projects.

None of these projects is fully mature at this point, but this is definitely something doable

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nim-lang/needed-libraries/issues/77#issuecomment-385977811, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt1BZQX3jCaLkItgxJvCC2tRNjxO9Tbks5tubTPgaJpZM4Qh_O5 .

EelcoHoogendoorn avatar May 02 '18 14:05 EelcoHoogendoorn

I think most active nim users are aware of this by now, but there's a functioning plotting library here: https://github.com/brentp/nim-plotly

since it serializes to json and uses plotly.js to plot (but it works for the C backend), it will have a limited number of points, but when using webGL it can plot ~200K points in my browser and still be tolerably responsive.

brentp avatar May 17 '18 15:05 brentp

Hi brentp;

Thats looking pretty cool indeed! Note that I am not trying to take a jab at plotting in nim specifically, but trying to make a point about the relative size of the ecosystem of python and nim generally; plotting is just an example.

I think itd be foolish to expect nim to be able to compete with python anytime soon on that front; making sure we have first-class two-way interop between the two sounds like it might happen a decade sooner at least.

EelcoHoogendoorn avatar May 17 '18 16:05 EelcoHoogendoorn

And finally we can do non-linear least square fitting in Nim :)

https://github.com/Vindaar/nim-mpfit

Vindaar avatar Jun 20 '18 09:06 Vindaar

Finally spent some time to make the interface for my NLopt wrapper nicer and create a PR for nimble for it. So if non-linear least square fitting isn't for you, maybe general nonlinear optimization is. ;)

https://github.com/Vindaar/nimnlopt

Vindaar avatar Jul 02 '18 20:07 Vindaar

For some precision engineering/scientific applications, the ability to use arbitrary precision floating point arithmetic would be useful. Does an MPFR wrapper a la Julia's built-in support for BigFloat belong on this list?

abudden avatar Aug 13 '18 13:08 abudden

@abudden Certainly.

Araq avatar Aug 13 '18 13:08 Araq

it seems that there is still no computer algebra system module like https://www.sympy.org/. I also made a post https://forum.nim-lang.org/t/4165

retsyo avatar Aug 31 '18 16:08 retsyo

a decent stats package would be a huge boon for my work. Even if it started with t-test and anova.

brentp avatar Sep 11 '18 16:09 brentp

https://github.com/fragcolor-xyz/nimtorch

Full pytorch for nim, for you.

sinkingsugar avatar Jan 11 '19 06:01 sinkingsugar

Do we want a category for natural language processing? Examples of Python libraries are nltk, gensim, spacy, and scikit-learn.

ihendley avatar Mar 21 '19 18:03 ihendley

Also, how about mathematical optimization - like scipy.optimize for example, and how about signal processing - like scipy.signal?

ihendley avatar Mar 21 '19 18:03 ihendley

@ihendley I think so, yes.

Araq avatar Mar 22 '19 11:03 Araq

Simulation

What about simulation? Something like simulink, modelica or Modia (in Julia).

It would be nice something similar to Modia in particular, given Nim's metaprogramming capabilities.

One area where I believe nim could shine is in exporting FMU model (following the FMI standard). I don't see python doing that. An even for Julia is a struggle because they need to export the runtime for compiled stuff which is big and not straightforward (here you can see how the libraries take above 100Mb for a simple example, when compiled ahead of time).

Relevant links

FMI Code Generator FMU SDK Sundials: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers in order to embed the solver in the FMU. Bindings for this would be useful even on itself. SimulatorToFMU

mantielero avatar Jun 29 '19 08:06 mantielero

It's been a while since I updated the original post but it's done :)

mratsim avatar Jun 30 '19 13:06 mratsim

having a (nearly?) fully functional jupyter kernel would be quite useful for my work and, I suspect for many people.

brentp avatar Oct 25 '19 00:10 brentp

having a (nearly?) fully functional jupyter kernel would be quite useful for my work and, I suspect for many people.

@brentp: There is (or was) jupyternim: https://github.com/stisa/jupyternim I'm not sure if it's abandoned and/or still compiles (last activity Oct 2018); I have never used it. Its downside is that it was written without hot code reloading in mind of course. However, I think it'd provide a nice basis for an updated implementation, which uses HCR for the relevant parts and the socket communication of jupyternim.

I once started playing around with HCR, but wasn't very successful even implementing a trivial repl, https://github.com/vindaar/brokenrepl. Posting it here if anyone wants to give it a try.

Vindaar avatar Oct 25 '19 13:10 Vindaar

yes, I saw that and inim from @stisa, now that there are ggplots and dataframes, the notebook would a be a boon.

brentp avatar Oct 25 '19 13:10 brentp

(my) jupyternim and inim are the same code, there was a naming conflict with https://github.com/AndreiRegiani/INim so I renamed it. I agree it's due an update, but I have been pretty busy this year.
Last time I saw, HCR was limited to JS target, looking at https://nim-lang.org/docs/hcr.html there was a lot of progress so I may have a look into adopting it when I get some free time, if nobody starts working on it first.

stisa avatar Oct 25 '19 15:10 stisa

I've just published a pure Nim k-d tree implementation here.

jblindsay avatar Apr 11 '20 14:04 jblindsay

@mratsim, @brentp, @HugoGranstrom and me chatted recently about trying to unify the science related code a little more. While we didn't decide anything specific yet, we talked about creating an organization to hold related repositories in the future:

https://github.com/SciNim

I only invited a few people that from the top of my head use Nim for science related stuff. If you want join, feel free to message me or just join the gitter channel here:

https://gitter.im/SciNim/community

and say hi.

Vindaar avatar Apr 24 '20 12:04 Vindaar

I played during easter about creating a web based on Hugo for this purpose. I am happy to provide it to you.

I have uploaded it here: https://mantielero.github.io/nim4science/

Feel free to use it.

mantielero avatar Apr 24 '20 14:04 mantielero