phaster_scripts icon indicating copy to clipboard operation
phaster_scripts copied to clipboard

Small utility script to submit genomes to PHASTER for phage detection

PHASTER scripts

Small utility scripts to query the PHASTER API endpoint, to identify and annotate prophage sequences within bacterial genomes and plasmids.

Run the script without arguments or with -h/--help to see a list of available options. The script creates a small local database in a tab separated file called phaster_jobs.tsv, where it stores date and time of most recent API query attempt, along with the last observed job status for each submitted file. The name of the database file can be modified with the -d/--database argument.

WARNING: The "database" this script uses is a simple TSV file and the script cannot operate with several running instances against the same database file. Avoid running this script in parallel.

Submit a job with a single complete genome sequence

It is very simple to submit a single complete genome sequence to PHASTER using the script:

$ ./phaster.py --fasta path/to/genome.fasta

This will submit the sequence file to the online API and store the submission in the database file. The information in the database file is required to keep track of submission job IDs, so results can be downloaded when the submitted job is finished.

Submit a job with a draft genome assembly (several contigs)

The PHASTER API needs to know if the input file contains multiple sequences (contigs), so the script will include this information if you use the -c/--contigs argument:

$ ./phaster.py --contigs --fasta path/to/genome.fasta

Query previously submitted job(s)

Running with the -g/--get-status argument will automatically query the status of all previously jobs listed in the database:

$ ./phaster.py --get-status