software-submission icon indicating copy to clipboard operation
software-submission copied to clipboard

`sourmash` submission

Open bluegenes opened this issue 1 year ago • 32 comments

Submitting Author: Tessa Pierce-Ward (@bluegenes) All current maintainers: @ctb, @luizirber, @bluegenes Package Name: sourmash One-Line Description of Package: sourmash is a command line tool and Python library for sketching collections of DNA, RNA, and amino acid k-mers for biological sequence search, comparison, and analysis. Repository Link: https://github.com/sourmash-bio/sourmash Version submitted: 4.8 Editor: @snacktavish Emily Jane McTavish Reviewer 1: @LilyAnderssonLee Lili Andersson-Li Reviewer 2: @elais Elais Player Archive: TBD
Version accepted: TBD Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

Large collections of genomes, transcriptomes, and raw sequencing data sets are readily available in biology. With the scale of data now available, the field needs lightweight computational methods for searching and summarizing the content of both public and private collections. sourmash implements FracMinHash sketching, a lightweight technique that supports accurate estimation of overlap and containment between two sequencing data sets. sourmash provides a flexible set of programmatic functionality for sequence search and comparison, together with a robust and well-tested command-line interface.

Scope

  • Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • [ ] Data retrieval
    • [ ] Data extraction
    • [x] Data processing/munging
    • [ ] Data deposition
    • [ ] Data validation and testing
    • [ ] Data visualization[^1]
    • [ ] Workflow automation
    • [ ] Citation management and bibliometrics
    • [ ] Scientific software wrappers
    • [ ] Database interoperability

Domain Specific & Community Partnerships

  • [ ] Geospatial
  • [ ] Education
  • [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

The target audience for sourmash is biologists looking to compare biological sequencing datasets of any kind. sourmash's FracMinHash sketching produces a small, irreversible representation of each dataset, allowing users to search and compare without compromising private data. We provide a user-friendly CLI with mature functionality as well as a python API and experimental Rust API for computational biologists with advanced use cases.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

There are a number of sketching tools for genomic data (e.g. Mash, kmindex, Dashing), each of which implements a slightly different sketching technique. sourmash's FracMinHash enables accurate comparisons of sets of very different sizes (unlike standard MinHash implemented in, e.g. Mash), enabling analysis of genomes contained within metagenomes (Irber et al. 2022), which supports taxonomic profiling (Portik et al., 2022) and content-based search of publicly-available metagenomes. sourmash now additionally supports estimation of Average Nucleotide Identity from FracMinHash (Rahman Hera et al., 2023). The sourmash team has focused on maintaining a robust user interface and continually improving functionality and user experience, with ~ monthly software releases.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

N/A

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • [x] does not violate the Terms of Service of any service it interacts with.
  • [x] uses an OSI approved license.
  • [x] contains a README with instructions for installing the development version.

in 'developer notes', not main README.

  • [x] includes documentation with examples for all functions.
  • [x] contains a tutorial with examples of its essential functions and uses.
  • [x] has a test suite.
  • [x] has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • [x] The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • [x] The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • [x] The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • [x] The package is deposited in a long-term repository with the DOI: 10.21105/joss.00027

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • [x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • [x] I have read the author guide.
  • [x] I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

bluegenes avatar Aug 14 '23 21:08 bluegenes