mdshare icon indicating copy to clipboard operation
mdshare copied to clipboard

Get access to our MD data files.

mdshare

Get access to our MD data files.

CircleCI Codacy Badge

This is a downloader for molecular dynamics (MD) data from a public FTP server at FU Berlin. See here for a full list of available datasets and terms of use.

Example

This code will download a file (if it does not already exist locally) with a featurized set of three alanine dipeptide MD trajectories and store its content of three numpy.ndarray objects (each of shape=[250000, 2], dtype=numpy.float32) in the list trajs:

import mdshare
import numpy as np

local_filename = mdshare.fetch('alanine-dipeptide-3x250ns-backbone-dihedrals.npz')
with np.load(local_filename) as fh:
    trajs = [fh[key] for key in sorted(fh.keys())]

By default, the mdshare.fetch() function will look in and download to the current directory (function parameter working_directory='.'). If you instead set this parameter to None ...

local_filename = mdshare.fetch(
    'alanine-dipeptide-3x250ns-backbone-dihedrals.npz',
    working_directory=None)

... the file will be downloaded to a temporary directory. In both cases, the function will return the path to the downloaded file.

Should the requested file already be present in the working_directory, the download is skipped.

Using mdshare.catalogue() to view the files and filesizes of the available trajectories ...

mdshare.catalogue()

... produces the output:

Repository: http://ftp.imp.fu-berlin.de/pub/cmb-data/
Files:
alanine-dipeptide-0-250ns-nowater.xtc                  42.9 MB
alanine-dipeptide-1-250ns-nowater.xtc                  42.9 MB
alanine-dipeptide-2-250ns-nowater.xtc                  42.9 MB
alanine-dipeptide-3x250ns-backbone-dihedrals.npz        6.0 MB
alanine-dipeptide-3x250ns-heavy-atom-distances.npz    135.0 MB
[...]
Containers:
mdshare-test.tar.gz                                   193.0 bytes
pyemma-tutorial-livecoms.tar.gz                       123.9 MB

Using mdshare.search(filename_pattern) to select for a given group of files ...

pentapeptide_xtcs = mdshare.search('penta*xtc')
print(pentapeptide_xtcs)

... produces the output:

['pentapeptide-00-500ns-impl-solv.xtc',
 'pentapeptide-01-500ns-impl-solv.xtc',
 'pentapeptide-02-500ns-impl-solv.xtc',
...
 'pentapeptide-22-500ns-impl-solv.xtc',
 'pentapeptide-23-500ns-impl-solv.xtc',
 'pentapeptide-24-500ns-impl-solv.xtc']