BlockSci
BlockSci copied to clipboard
Dockerfile for Reproducibility (and Binder)
A Dockerfile to build and host these notebooks would be helpful.
There are cookiecutter templates for creating git repos w/ .gitignore and a Makefile and ... for these types of projects:
- http://cookiecutter.readthedocs.io/en/latest/readme.html#reproducible-science
- http://cookiecutter.readthedocs.io/en/latest/readme.html#data-science
There are also Docker containers which make it easy to launch a complete, consistent software environment:
- https://github.com/Kaggle/docker-python
- https://github.com/Kaggle/docker-python/blob/master/Dockerfile
- https://github.com/jupyter/docker-stacks
Data Visualization w/ Jupyter Notebooks:
- nbgrid is one way to review DataFrames with a GUI widget in a Jupyter Notebook. https://github.com/quantopian/qgrid
Python & Jupyter resources:
- https://github.com/quobit/awesome-python-in-education/#data-science
- https://github.com/quobit/awesome-python-in-education/#jupyter
- binder makes reproducibility with Git, Docker, and JupyterHub really easy:
- Src: https://github.com/jupyterhub/binderhub
- Docs: https://binderhub.readthedocs.io/en/latest/
- JupyterHub makes hosting Jupyter Notebook instances (with e.g. GitHub Auth) within Docker containers managed by Kubernetes very easy.
- binder makes reproducibility with Git, Docker, and JupyterHub really easy:
either my docker machine has very little ram so this fails to compile or Ive done something wrong. Nevertheless here's a Dockerfile i quickly scripted earlier today
FROM ubuntu:bionic
LABEL maintainer="Haaroon Yousaf (h.yousaf [at] ucl.ac.uk)"
RUN apt-get update && apt-get install -y software-properties-common python3-software-properties
RUN add-apt-repository ppa:ubuntu-toolchain-r/test -y && apt-get update
RUN apt install -y cmake libtool autoconf libboost-filesystem-dev libboost-iostreams-dev \
libboost-serialization-dev libboost-thread-dev libboost-test-dev libssl-dev libjsoncpp-dev \
libcurl4-openssl-dev libjsoncpp-dev libjsonrpccpp-dev libsnappy-dev zlib1g-dev libbz2-dev \
liblz4-dev libzstd-dev libjemalloc-dev libsparsehash-dev python3-dev python3-pip git gcc-7 \
clang-5.0 g++-7 c++17
RUN pip3 install matplotlib numpy pandas jupyter jupyter-core
WORKDIR /root/
RUN git clone https://github.com/citp/BlockSci.git
WORKDIR /root/BlockSci
RUN mkdir -p /root/BlockSci/release
RUN mkdir -p /root/data
WORKDIR /root/BlockSci/release
RUN CC=gcc-7 CXX=g++-7 cmake -DCMAKE_BUILD_TYPE=Release ..
RUN make && make install
WORKDIR /root/BlockSci/
RUN CC=gcc-7 CXX=g++-7 pip3 install -e blockscipy
WORKDIR /root/BlockSci/Notebooks
EXPOSE 8888
VOLUME ["/root/data"]
CMD jupyter notebook && bash
Here's my docker file and instructions: GitHub There is a dockerfile with all the little problems I came across the way. It might change down the line. Hope it helps somebody. And if for any reason you build it on Windows: Don't. Forget. To. Assign. Ressources. And also watch out for file permissions. Or do it in Linux right away :)