numerflow icon indicating copy to clipboard operation
numerflow copied to clipboard

Data workflows for the numer.ai machine learning competition

numerflow

Data workflows for the numer.ai machine learning competition

Tasks

Currently implemented:

  • fetch and extract the datasets
  • train and predict
  • automatic upload

Task Documentation

FetchAndExtractData

Fetches the dataset zipfile and extracts the contents to output-path.

Parameters

  • output-path: where the datasets should be saved eventually (defaults to ./data/)
  • dataset-path: URI of the remote dataset

TrainAndPredict

Trains a Bernoulli Naïve Bayes classifier and predicts the targets. Output file is saved at output-path with a custom, timestamped file name.

Parameters

  • output-path: where the datasets should be saved eventually (defaults to ./data/)
  • dataset-path: URI of the remote dataset

UploadPredictions

Uploads the predictions of not already uploaded.

Parameters

  • output-path: where the datasets should be saved eventually (defaults to ./data/)
  • dataset-path: URI of the remote dataset
  • usermail: user email
  • userpass: user password
  • filepath: path to the file ought to be uploaded

Usage

Prepare the project:

pip install -r requirements.txt --ignore-installed

If not alread done create an API key here with at least the following permissions:

  • Upload submissions.
  • View historical submission info.
  • View user info, (e.g. balance, withdrawal history)

To run the complete pipeline:

env PYTHONPATH='.' luigi --local-scheduler --module workflow Workflow --secret="YOURSECRET" --public-id="YOURPUBLICID"