Forceatlas2_layout: slow?
Description
Hi! Really interested in using Datashader to deal with large scale data-vis. I've been trying to go through the Networks part of the user guide and I wanted to use my own data. I have a dataset of 1,184,684 nodes and 1,210,193 edges. I've reshaped my original data so that it is two separate nodes and edges DataFrames, with the edges df providing source and target of the relevant nodes indexes in the nodes df.
Circular layout worked fine and produced a result within a few seconds. However...
force_directed = forceatlas2_layout(nodes, edges, id='id', source='source',target='target')
...has been running for about an hour with the process taking about 280% CPU and is yet to complete. I understand the mechanics of the force atlas layout are more complex than circular but I wondered if this amount of processing time is to be expected, and/or if there is a way to speed it up.
Thanks for all your efforts on this package. It's a great project.
Your environment
- MacOS Mojave 10.14.3
Model Name: Mac Pro
Model Identifier: MacPro6,1
Processor Name: Quad-Core Intel Xeon E5
Processor Speed: 3.7 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 10 MB
Memory: 12 GB
Conda Info
active env location : /Users/James/anaconda3/envs/community_mapper
shell level : 2
user config file : /Users/James/.condarc
populated config files : /Users/James/.condarc
conda version : 4.6.8
conda-build version : 3.17.6
python version : 3.7.1.final.0
base environment : /Users/James/anaconda3 (writable)
channel URLs : https://conda.anaconda.org/conda-forge/osx-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/osx-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/free/osx-64
https://repo.anaconda.com/pkgs/free/noarch
https://repo.anaconda.com/pkgs/r/osx-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /Users/James/anaconda3/pkgs
/Users/James/.conda/pkgs
envs directories : /Users/James/anaconda3/envs
/Users/James/.conda/envs
platform : osx-64
user-agent : conda/4.6.8 requests/2.21.0 CPython/3.7.1 Darwin/18.2.0 OSX/10.14.3
UID:GID : 501:20
netrc file : None
offline mode : False```
Conda list
```# packages in environment at /Users/James/anaconda3/envs/community_mapper:
#
# Name Version Build Channel
appnope 0.1.0 py36_1000 conda-forge
asn1crypto 0.24.0 py36_1003 conda-forge
attrs 19.1.0 py_0 conda-forge
backcall 0.1.0 py_0 conda-forge
blas 1.0 mkl anaconda
bleach 3.1.0 py_0 conda-forge
blinker 1.4 py_1 conda-forge
bokeh 1.0.4 py36_1000 conda-forge
bzip2 1.0.6 h1de35cc_1002 conda-forge
ca-certificates 2019.3.9 hecc5488_0 conda-forge
certifi 2019.3.9 py36_0 conda-forge
cffi 1.12.2 py36h2d6ddff_1 conda-forge
chardet 3.0.4 py36_1003 conda-forge
click 7.0 py_0 conda-forge
cloudpickle 0.8.0 py_0 conda-forge
colorcet 1.0.0 py_0 conda-forge
cryptography 2.6.1 py36hc2b1221_0 conda-forge
cycler 0.10.0 py_1 conda-forge
cytoolz 0.9.0.1 py36h1de35cc_1001 conda-forge
dask 1.1.4 py_0 conda-forge
dask-core 1.1.4 py_0 conda-forge
datashader 0.6.9 py_0 conda-forge
datashape 0.5.4 py_1 conda-forge
decorator 4.4.0 py_0 conda-forge
defusedxml 0.5.0 py_1 conda-forge
distributed 1.26.0 py36_1 conda-forge
entrypoints 0.3 py36_1000 conda-forge
freetype 2.10.0 h24853df_0 conda-forge
heapdict 1.0.0 py36_1000 conda-forge
idna 2.8 py36_1000 conda-forge
imageio 2.5.0 py36_0 conda-forge
intel-openmp 2019.3 199 anaconda
ipykernel 5.1.0 py36h24bf2e0_1002 conda-forge
ipython 7.3.0 py36h24bf2e0_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jedi 0.13.3 py36_0 conda-forge
jinja2 2.10 py_1 conda-forge
jpeg 9c h1de35cc_1001 conda-forge
jsonschema 3.0.1 py36_0 conda-forge
jupyter_client 5.2.4 py_3 conda-forge
jupyter_core 4.4.0 py_0 conda-forge
jupyterlab 0.35.4 py36_0 conda-forge
jupyterlab_server 0.2.0 py_0 conda-forge
kiwisolver 1.0.1 py36h04f5b5a_1002 conda-forge
libcxx 4.0.1 h579ed51_0
libcxxabi 4.0.1 hebd6815_0 conda-forge
libffi 3.2.1 h6de7cb9_1006 conda-forge
libgfortran 3.0.1 h93005f0_2 anaconda
libpng 1.6.36 ha441bb4_1000 conda-forge
libsodium 1.0.16 h1de35cc_1001 conda-forge
libtiff 4.0.10 h79f4b77_1001 conda-forge
llvmlite 0.26.0 py36h3fea490_1000 conda-forge
locket 0.2.0 py_2 conda-forge
markupsafe 1.1.1 py36h1de35cc_0 conda-forge
matplotlib-base 3.0.3 py36hf043ca5_0 conda-forge
mistune 0.8.4 py36h1de35cc_1000 conda-forge
mkl 2019.3 199 anaconda
mkl_fft 1.0.10 py36h5e564d8_0 anaconda
mkl_random 1.0.2 py36h27c97d8_0 anaconda
msgpack-python 0.6.1 py36h04f5b5a_0 conda-forge
multipledispatch 0.6.0 py_0 conda-forge
nbconvert 5.4.1 py_2 conda-forge
nbformat 4.4.0 py_1 conda-forge
ncurses 6.1 h0a44026_1002 conda-forge
networkx 2.2 py_1 conda-forge
notebook 5.7.6 py36_0 conda-forge
numba 0.41.0 py36h1702cab_1000 conda-forge
numpy 1.16.2 py36hacdab7b_0 anaconda
numpy-base 1.16.2 py36h6575580_0 anaconda
oauthlib 3.0.1 py_0 conda-forge
olefile 0.46 py_0 conda-forge
openssl 1.1.1b h01d97ff_2 conda-forge
packaging 19.0 py_0 conda-forge
pandas 0.24.2 py36h0a44026_0 anaconda
pandoc 2.7.1 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
param 1.8.2 py_0 conda-forge
parso 0.3.4 py_0 conda-forge
partd 0.3.9 py_0 conda-forge
pexpect 4.6.0 py36_1000 conda-forge
pickleshare 0.7.5 py36_1000 conda-forge
pillow 5.4.1 py36hbddbef0_1000 conda-forge
pip 19.0.3 py36_0 conda-forge
prometheus_client 0.6.0 py_0 conda-forge
prompt_toolkit 2.0.9 py_0 conda-forge
psutil 5.6.1 py36h1de35cc_0 conda-forge
ptyprocess 0.6.0 py36_1000 conda-forge
pycparser 2.19 py36_1 conda-forge
pyct 0.4.6 py_0 conda-forge
pyct-core 0.4.6 py_0 conda-forge
pygments 2.3.1 py_0 conda-forge
pyjwt 1.7.1 py_0 conda-forge
pymongo 3.7.2 py36h0a44026_0 conda-forge
pyopenssl 19.0.0 py36_0 conda-forge
pyparsing 2.3.1 py_0 conda-forge
pyrsistent 0.14.11 py36h1de35cc_0 conda-forge
pysocks 1.6.8 py36_1002 conda-forge
python 3.6.7 h8dc6b48_1004 conda-forge
python-dateutil 2.8.0 py_0 conda-forge
pytz 2018.9 py36_0 anaconda
pywavelets 1.0.2 py36h917ab60_0 conda-forge
pyyaml 5.1 py36h1de35cc_0 conda-forge
pyzmq 18.0.1 py36h4cc6ddd_0 conda-forge
readline 7.0 hcfe32e1_1001 conda-forge
requests 2.21.0 py36_1000 conda-forge
requests-oauthlib 1.2.0 py_0 conda-forge
scikit-image 0.14.2 py36h0a44026_1 conda-forge
scipy 1.2.1 py36h1410ff5_0
send2trash 1.5.0 py_0 conda-forge
setuptools 40.8.0 py36_0 conda-forge
six 1.12.0 py36_1000 conda-forge
sortedcontainers 2.1.0 py_0 conda-forge
sqlite 3.26.0 h1765d9f_1001 conda-forge
tblib 1.3.2 py_1 conda-forge
terminado 0.8.1 py36_1001 conda-forge
testpath 0.4.2 py_1001 conda-forge
tk 8.6.9 ha441bb4_1000 conda-forge
toolz 0.9.0 py_1 conda-forge
tornado 6.0.1 py36h1de35cc_0 conda-forge
traitlets 4.3.2 py36_1000 conda-forge
tweepy 3.6.0 py36_0 conda-forge
urllib3 1.24.1 py36_1000 conda-forge
wcwidth 0.1.7 py_1 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.33.1 py36_0 conda-forge
xarray 0.12.0 py_0 conda-forge
xz 5.2.4 h1de35cc_1001 conda-forge
yaml 0.1.7 h1de35cc_1001 conda-forge
zeromq 4.2.5 h0a44026_1006 conda-forge
zict 0.1.4 py_0 conda-forge
zlib 1.2.11 h1de35cc_1004 conda-forge```
More complex is an understatement! The force directed algorithm is very compute intensive. It's probably possible to speed it up, but for now I'd try it on a subset of your problem and try to see how it scales with problem size.
Ok thanks. As long as this is expected that is fine. I've moved my script to our university cluster computer to speed things up. Does the force directed function benefit from multiple cores?
Many thanks for your quick response.
The force-directed code can probably be updated relatively easily to use Numba's parallel for loops for supporting multiple cores; see https://github.com/pyviz/datashader/blob/master/datashader/layout.py . I don't think that support was available from Numba when that code was first written. And of course Dask can be used to distributed the code across cluster nodes, but I haven't looked into the details of the algorithm to know how difficult that would be. PRs welcome! :-)
If anyone is interested I achieved quite good speedup by using Holoviews along with an independent implementation of Forceatlas 2. It is designed with a networkx style interface so can be slotted straight into the Holoviews Graph.from_networkx method where you would normally put a networkx layout function.
hv.Graph.from_networkx(G, forceatlas2.forceatlas2_networkx_layout).opts(tools=['hover'])
You can get the implementation here