titanoboa-tasklets
titanoboa-tasklets copied to clipboard
ready-made workflow steps for titanoboa
Titanoboa Step Functions
This repository contains sample ready-made steps for titanoboa (github repository is here ):
AWS
-
AWS EC2
-
AWS S3
-
AWS SES
-
AWS SNS
-
AWS SQS
🧬 Bioinformatics :microscope:
- 🧬 K-mer Count
Http Client
JDBC Client
Kafka Producer & Consumer
PDF Generation
SFTP Client
Smtp Client
SSH Client
AWS EC2 
Provides functions to list, start and stop EC2 instances. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.ec2
Usage
List EC2 Instances
:workload-fn
io.titanoboa.tasklet.aws.ec2/list-instances
Sample Step Definition
{:type :aws-ec2-list,
:supertype :tasklet,
:description "Lists all EC2 instances for all reservations.\nReturns :ec2-instances key with list of instances as a value:\n{:ec2-instances [{instance1 map} {instance2 map} ...]}",
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/list-instances", :type "clojure"}}
Start EC2 Instances
:workload-fn
io.titanoboa.tasklet.aws.ec2/start-instances
Sample Step Definition
{:type :aws-ec2-start,
:supertype :tasklet,
:description "Starts an EC2 instance.\nReturns :starting-instances key with status value map.",
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/start-instances", :type "clojure"}
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :instance-ids ["i-0a123a454b678aeb6"]}
}
Stop EC2 Instances
:workload-fn
io.titanoboa.tasklet.aws.ec2/stop-instances
Sample Step Definition
{:type :aws-ec2-stop,
:supertype :tasklet,
:description "Stops an EC2 instance.\nReturns :stopping-instances key with status value map.",
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :instance-ids ["i-0a123a454b678aeb6"]},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/stop-instances", :type "clojure"}
}
AWS S3 
Provides functions to read, download and upload S3 objects. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.s3
Usage
Read S3 Object
:workload-fn
io.titanoboa.tasklet.aws.s3/read
Sample Step Definition
{:type :aws-s3-read,
:supertype :tasklet,
:description "Reads textual content of a s3 file and returns it as a job property :s3-object",
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/read", :type "clojure"}
:properties {:key "index.html", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :bucket ""}}
Download S3 Object
:workload-fn
io.titanoboa.tasklet.aws.s3/download
Sample Step Definition
{:type :aws-s3-download,
:supertype :tasklet,
:description "Downloads a file from s3 bucket to job directory under the specified name.",
:properties {:key "index.html", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :save-as "path/to/file", :bucket "bucket-name"},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/download", :type "clojure"}}
Upload S3 Object
:workload-fn
io.titanoboa.tasklet.aws.s3/upload
Sample Step Definition
{:type :aws-s3-upload,
:supertype :tasklet,
:description "Uploads specified file from job directory into the given s3 bucket.",
:properties {:key "index.bkp", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :file-path "index.html", :bucket ""},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/upload", :type "clojure"}}
AWS SES 
Provides functions to send email via AWS SES. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.ses
Usage
:workload-fn
io.titanoboa.tasklet.aws.ses/send-email
Sample Step Definition
{:type :aws-ses,
:supertype :tasklet,
:description "Sends an email via SES.\nReturns :message-id key with message id value.\n",
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-west-1"}, :from "[email protected]",
:message {:body {:html "testing 1-2-3-4", :text "testing 1-2-3-4"}, :subject "greetings from titanoboa"}, :to ["[email protected]"]},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ses/send-email", :type "clojure"}}
AWS SNS 
Provides functions to send notification via AWS SNS. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.sns
Usage
:workload-fn
io.titanoboa.tasklet.aws.sns/publish
Sample Step Definition
{:type :aws-sns,
:supertype :tasklet,
:description "Publishes a message into an SNS topic.",
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.aws.sns/publish",
:type "clojure"},
:properties {:topic-arn "arn:aws:sns:us-east-1:676820690883:my-topic",
:subject "test",
:message "",
:message-attributes {"attr" "value"}}}
AWS SQS 
Provides functions to send message via AWS SQS. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.sqs
Usage
:workload-fn
io.titanoboa.tasklet.aws.sqs/send-message
Sample Step Definition
{:type :aws-sqs,
:supertype :tasklet,
:description "Sends a text message to a queue.",
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.aws.sqs/send-message",
:type "clojure"},
:properties {:credentials {:access-key "",
:secret-key "",
:endpoint "eu-central-1"},
:message-attributes {},
:message-body "",
:queue-url ""}}
JDBC Client 
Performs a JDBC query and returns corresponding data. Note that code of jdbc tasklet is part of standard Titanoboa distribution and is not in this repository.
Installation
- Add whatever jdbc driver you need to use to titanoboa's ./lib folder
- Require namespace:
titanoboa.tasklet.jdbcin titanoboa's external dependencies file. You may also need to requiretitanoboa.system.jdbc(see point 3.) - Do not forget to also define and configure corresponding jdbc system for DB connection pooling in your server configuration (in this example there is a connection pool system :test-db that is using
titanoboa.system.jdbc/jdbc-pool
Usage
:workload-fn
titanoboa.tasklet.jdbc/query
Sample Step Definition
{:type :jdbc
:supertype :tasklet
:workload-fn #titanoboa.exp/Expression {:value "titanoboa.tasklet.jdbc/query"}
:properties {:response-property-name :db-data
:data-source-ks [:test-db :system :pool]
:query {:select [:o.ordernumber :o.TotalAmount :c.FirstName :c.LastName :c.City :c.Country],
:from [[:customers :c]]
:left-join [[:orders :o] [:= :c.id :o.customerid]]
:order-by [[:o.totalamount :desc :nulls-last]]
:limit 50}}}
Expected step properties are as follows:
:query- either a query string or a map in honeysql format:data-source-kskey set pointing to the JDBC data source object among the running systems, when used withtitanoboa.system.jdbc/jdbc-poolthe format is[:< jdbc pool systemu> :system :pool]so e.g. if the jdbc system is:test-dbthen it is[:test-db :system :pool]:response-property-nameis self-explanatory
Http Client 
Makes an http(s) call and returns (parsed) response. Primarily uses clj-http library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.httpclient
Usage
:workload-fn
io.titanoboa.tasklet.httpclient/request
Sample Step Definition
{:type :http-client
:supertype :tasklet
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.httpclient/request" :type "clojure"}
:properties {:url "https://jsonplaceholder.typicode.com/posts/1"
:request-method :get
:as :json
:proxy-host "127.0.0.1"
:proxy-port 8118
:response-property-name :rest-response
:body-only? false
:connection-pool {:timeout 5 :threads 4 :insecure? false :default-per-route 10}}}
Smtp Client 
Sends email via smtp. Primarily uses postal library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.smtp
Usage
:workload-fn
titanoboa.tasklet.smtp/send
Sample Step Definition
{:type :smtp
:supertype :tasklet
:workload-fn #titanoboa.exp/Expression{:value "titanoboa.tasklet.smtp/send"}
:properties {:connection {:host "localhost"
:port 25
:user ""
:pass ""
:ssl false
:tls false}
:email {:from "[email protected]"
:to "[email protected]"
:cc ["[email protected]", "[email protected]", "[email protected]"]
:bcc "[email protected]"
:subject "Cat!"
:date #titanoboa.exp/Expression{:value "(java.util.Date.)"}
:message-id ""
:user-agent ""
:body [{:type "text/plain"
:content "Hey folks,\n\nCheck out these pictures of my cat!"}
{:type :inline
:content #titanoboa.exp/Expression{:value "(File. \"/tmp/lester-flying-photoshop\")"}
:content-type "image/jpeg"
:file-name "lester-flying.jpeg"}
{:type :attachment
:content #titanoboa.exp/Expression{:value "(File. \"/tmp/lester-upside-down.jpeg\")"}}]}}}
SSH and SFTP 
SSH and SFTP Client. Primarily uses clj-ssh library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.ssh
Usage
SSH
:workload-fn
io.titanoboa.tasklet.ssh/ssh
Sample Step Definition
{:type :ssh,
:supertype :tasklet,
:description "SSH Client",
:properties {:ssh-agent-settings {:use-system-ssh-agent false},
:identities {:private-key-path "/path/to/key.pem"},
:ssh-cmd-map {:in "echo hello"},
:host "xxx.eu-central-1.compute.amazonaws.com",
:session-options {:username "ec2-user", :strict-host-key-checking "no", :preferred-authentications "publickey"}},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.ssh/ssh", :type "clojure"}}
SFTP
:workload-fn
io.titanoboa.tasklet.ssh/sftp
Sample Step Definition
{:type :sftp,
:supertype :tasklet,
:description "SFTP Client",
:properties {:ssh-agent-settings {:use-system-ssh-agent false},
:identities {:private-key-path "/path/to/key.pem"},
:sftp-cmds-vec [[:ls "/home/ec2-user/"]],
:host "xxx.eu-central-1.compute.amazonaws.com",
:session-options {:username "ec2-user",
:strict-host-key-checking "no",
:preferred-authentications "publickey"}},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.ssh/sftp", :type "clojure"}}
PDF 
Generates a pdf file based on job properties. Primarily uses clj-pdf library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.pdf
Usage
:workload-fn
io.titanoboa.tasklet.pdf/generate-pdf
Sample Properties
{:pdf-sections [[:list {:roman true}
[:chunk {:style :bold} "a bold item"]
"another item"
"yet another item"]
[:phrase "some text"]
[:phrase "some more text"]
[:paragraph "yet more text"]]
:file-name "example.pdf"
:pdf-metadata {:bottom-margin 10, :creator "Jane Doe", :doc-header ["inspired by" "William Shakespeare"], :right-margin 50, :left-margin 10, :footer "page", :header "page header", :size "a4", :title "Test doc", :author "John Doe", :top-margin 20, :subject "Some subject"}}
Sample Step Definition
{:type :pdf-generation
:supertype :tasklet
:properties
{:pdf-sections [[:list {:roman true}
[:chunk {:style :bold} "a bold item"]
"another item"
"yet another item"]
[:phrase "some text"]
[:phrase "some more text"]
[:paragraph "yet more text"]]
:file-name "example.pdf"
:pdf-metadata {:bottom-margin 10, :creator "Jane Doe", :doc-header ["inspired by" "William Shakespeare"], :right-margin 50, :left-margin 10, :footer "page", :header "page header", :size "a4", :title "Test doc", :author "John Doe", :top-margin 20, :subject "Some subject"}}
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.pdf/generate-pdf", :type "clojure"}}
Kafka Producer & Consumer 
A simple Kafka producer and consumer. Primarily uses dvlopt/kafka library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.kafka
Usage
Producer
:workload-fn
io.titanoboa.tasklet.kafka/produce
Sample Step Definition
{:type :kafka-produce,
:supertype :tasklet,
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kafka/produce",
:type "clojure"},
:properties {:kafka-producer-config {:dvlopt.kafka/nodes [["localhost"
9092]],
:dvlopt.kafka/serializer.key :long,
:dvlopt.kafka/serializer.value :string,
:dvlopt.kafka.out/configuration {"client.id" "my-producer",
"transactional.id" "some transaction id"}},
:records [{:topic "test-topic",
:key 123,
:value "Hello World!"}]}}
Consumer
:workload-fn
io.titanoboa.tasklet.kafka/consume
Sample Step Definition
{:type :kafka-consume,
:supertype :tasklet,
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kafka/consume",
:type "clojure"},
:properties {:kafka-topics ["test-topic"],
:poll-options {:dvlopt.kafka/timeout [1
:seconds]},
:kafka-consumer-config {:dvlopt.kafka/nodes [["localhost"
9092]],
:dvlopt.kafka/deserializer.key :long,
:dvlopt.kafka/deserializer.value :string,
:dvlopt.kafka.in/configuration {"auto.offset.reset" "earliest",
"enable.auto.commit" false,
"max.poll.records" "50",
"group.id" "my-group"}}}}
🧬 K-mer count
Few simple functions to help with K-mer counting and analysis of FASTQ data files. Also contains functions for splitter (map) and agregator (reduce) type of steps to help with parallel processing.
Note that a thought needs to be put into what underlying file system that would be used (e.g. HDFS, EFS etc.) and whether a physical splitting of the file would be performed prior to the counting.
Installation
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.kmer
Usage
K-Mer count
:workload-fn
io.titanoboa.tasklet.kmer/kmer-count
Sample Job Properties
{:create-folder? false,
:fastq-file "/path/to/fastq/file",
:start 0,
:end 12,
:k 3,
:top-n 10}
Map/Reduce Steps
Map :workload-fn
io.titanoboa.tasklet.kmer/split-fastq
Reduce :workload-fn
io.titanoboa.tasklet.kmer/reduce-kmers
Sample Job Properties
{:fastq-file "/path/to/fastq/file",
:k 3,
:split-to 12}
Sample Map/Reduce Workflow Definition
{:first-step "splitter",
:name "kmer-map-reduce",
:revision 4,
:type nil,
:properties {:fastq-file "/mnt/efs/sars2/reclojure.fastq",
:k 3,
:split-to 12,
:top-n 10},
:steps [{:id "splitter",
:type :map,
:supertype :map,
:next [["*" "aggregator"]],
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kmer/split-fastq",
:type "clojure"},
:properties {:jobdef-name "k-mer-count",
:sys-key :core,
:standalone-system? false},
:revision 1}
{:id "aggregator",
:type :reduce,
:supertype :reduce,
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kmer/reduce-kmers",
:type "clojure"},
:next [],
:properties {:map-step-id "splitter", :commit-interval 100},
:revision 1}]}