hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`

hgvs - manipulate biological sequence variants according to Human Genome Variation Society recommendations !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Important: biocommons packages require Python 3.6+. More <https://groups.google.com/forum/#!topic/hgvs-discuss/iLUzjzoD-28>__

The hgvs package provides a Python library to parse, format, validate, normalize, and map sequence variants according to Variation Nomenclature_ (aka Human Genome Variation Society) recommendations.

Specifically, the hgvs package focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package does not attempt to cover the full scope of HGVS recommendations. Please refer to issues <https://github.com/biocommons/hgvs/issues>_ for limitations.

+--------------------+--------------------------------------------------------------------+ | Information | | |rtd| |changelog| |getting_help| | | | | |github_license| |binder| | +--------------------+--------------------------------------------------------------------+ | Latest Release | |github_tag| |pypi_rel| |hit| | +--------------------+--------------------------------------------------------------------+ | Development | | |status_rel| |coveralls| | | (main branch) | | | |issues| |github_open_pr| |github_contrib| | | | | |github_stars| |github_forks| | +--------------------+--------------------------------------------------------------------+

Features @@@@@@@@

  • Parsing is based on formal grammar.
  • An easy-to-use object model that represents most variant types (SNVs, indels, dups, inverstions, etc) and concepts (intronic offsets, uncertain positions, intervals)
  • A variant normalizer that rewrites variants in canoncial forms and substitutes reference sequences (if reference and transcript sequences differ)
  • Formatters that generate HGVS strings from internal representations
  • Tools to map variants between genome, transcript, and protein sequences
  • Reliable handling of regions genome-transcript discrepancies
  • Pluggable data providers support alternative sources of transcript mapping data
  • Extensive automated tests, including those for all variant types and "problematic" transcripts
  • Easily installed using remote data sources. Installation with local data sources is straightforward and completely obviates network access

Important Notes @@@@@@@@@@@@@@@

  • You are encouraged to browse issues <https://github.com/biocommons/hgvs/issues>_. All known issues are listed there. Please report any issues you find.
  • Use a pip package specification to stay within minor releases. For example, hgvs>=1.5,<1.6. hgvs uses Semantic Versioning <http://semver.org/>__.

Examples @@@@@@@@

Installation #############

By default, hgvs uses remote data sources, which makes installation easy.

::

$ mkvirtualenv hgvs-test (hgvs-test)$ pip install --upgrade setuptools (hgvs-test)$ pip install hgvs (hgvs-test)$ python

See Installation instructions <http://hgvs.readthedocs.org/en/stable/installation.html>__ for details, including instructions for installing Universal Transcript Archive (UTA) <https://github.com/biocommons/uta/>__ and SeqRepo <https://github.com/biocommons/biocommons.seqrepo/>__ locally.

Configuration #############

hgvs will use publicly available data sources unless directed otherwise through environment variables, like so::

N.B. These are examples. The correct values will depend on your installation

$ export UTA_DB_URL=postgresql://anonymous:anonymous@localhost:5432/uta/uta_20180821 $ export HGVS_SEQREPO_DIR=/usr/local/share/seqrepo/latest

Alternatively, if you are unable to pass the postgresql password in the UTA_DB_URL environment variable (i.e., generating an auth token), you can set UTA_DB_URL to postgresql://<user>@<host>/<db>/<schema> and set PGPASSWORD. For example::

$ export UTA_DB_URL=postgresql://anonymous@localhost:5432/uta/uta_20180821 PGPASSWORD=anonymous

See the installation instructions for details.

Parsing and Formating #####################

hgvs parses HGVS variants (as strings) into an object model, and can format object models back into HGVS strings.

.. code-block:: python

import hgvs.parser

start with these variants as strings

hgvs_g = 'NC_000007.13:g.36561662C>T' hgvs_c = 'NM_001637.3:c.1582G>A'

parse the genomic variant into a Python structure

hp = hgvs.parser.Parser() var_g = hp.parse_hgvs_variant(hgvs_g) var_g SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T, gene=None)

SequenceVariants are composed of structured objects, e.g.,

var_g.posedit.pos.start SimplePosition(base=36561662, uncertain=False)

format by stringification

str(var_g) 'NC_000007.13:g.36561662C>T'

Projecting ("Mapping") variants between aligned genome and transcript sequences ###############################################################################

hgvs provides tools to project variants between genome, transcript, and protein sequences. Non-coding and intronic variants are supported. Alignment data come from the Universal Transcript Archive (UTA) <https://github.com/biocommons/uta/>__.

.. code-block:: python

import hgvs.dataproviders.uta import hgvs.assemblymapper

initialize the mapper for GRCh37 with splign-based alignments

hdp = hgvs.dataproviders.uta.connect() am = hgvs.assemblymapper.AssemblyMapper(hdp, ... assembly_name='GRCh37', alt_aln_method='splign', ... replace_reference=True)

identify transcripts that overlap this genomic variant

transcripts = am.relevant_transcripts(var_g) sorted(transcripts) ['NM_001177506.1', 'NM_001177507.1', 'NM_001637.3']

map genomic variant to one of these transcripts

var_c = am.g_to_c(var_g, 'NM_001637.3') var_c SequenceVariant(ac=NM_001637.3, type=c, posedit=1582G>A, gene=None) str(var_c) 'NM_001637.3:c.1582G>A'

CDS coordinates use BaseOffsetPosition to support intronic offsets

var_c.posedit.pos.start BaseOffsetPosition(base=1582, offset=0, datum=Datum.CDS_START, uncertain=False)

Translating coding variants to protein sequences ################################################

Coding variants may be translated to their protein consequences. HGVS uses the same pairing of transcript and protein accessions as seen in NCBI and Ensembl.

.. code-block:: python

translate var_c to its protein consequence

The object structure of protein variants is nearly identical to

that of nucleic acid variants and is converted to a string form

by stringification. Per HGVS recommendations, inferred consequences

must have parentheses to indicate uncertainty.

var_p = am.c_to_p(var_c) var_p SequenceVariant(ac=NP_001628.1, type=p, posedit=(Gly528Arg), gene=None) str(var_p) 'NP_001628.1:p.(Gly528Arg)'

setting uncertain to False removes the parentheses on the

stringified form

var_p.posedit.uncertain = False str(var_p) 'NP_001628.1:p.Gly528Arg'

formatting can be customized, e.g., use 1 letter amino acids to

format a specific variant

(configuration may also be set globally)

var_p.format(conf={"p_3_letter": False}) 'NP_001628.1:p.G528R'

Normalizing variants ####################

Some variants have multiple representations due to instrinsic biological ambiguity (e.g., inserting a G in a poly-G run) or due to misunderstanding HGVS recommendations. Normalization rewrites certain veriants into a single representation.

.. code-block:: python

rewrite ins as dup (depends on sequence context)

import hgvs.normalizer hn = hgvs.normalizer.Normalizer(hdp) hn.normalize(hp.parse_hgvs_variant('NM_001166478.1:c.35_36insT')) SequenceVariant(ac=NM_001166478.1, type=c, posedit=35dup, gene=None)

during mapping, variants are normalized (by default)

c1 = hp.parse_hgvs_variant('NM_001166478.1:c.31del') c1 SequenceVariant(ac=NM_001166478.1, type=c, posedit=31del, gene=None) c1n = hn.normalize(c1) c1n SequenceVariant(ac=NM_001166478.1, type=c, posedit=35del, gene=None) g = am.c_to_g(c1) g SequenceVariant(ac=NC_000006.11, type=g, posedit=49917127del, gene=None) c2 = am.g_to_c(g, c1.ac) c2 SequenceVariant(ac=NM_001166478.1, type=c, posedit=35del, gene=None)

There are more examples in the documentation <http://hgvs.readthedocs.org/en/stable/examples.html>_.

Citing hgvs (the package) @@@@@@@@@@@@@@@@@@@@@@@@@

| hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update. | Wang M, Callenberg KM, Dalgleish R, Fedtsov A, Fox N, Freeman PJ, Jacobs KB, Kaleta P, McMurry AJ, Prlić A, Rajaraman V, Hart RK | Human Mutation. 2018 Pubmed <https://www.ncbi.nlm.nih.gov/pubmed/30129167>__ | Open Access PDF <https://doi.org/10.1002/humu.23615>__

| A Python Package for Parsing, Validating, Mapping, and Formatting Sequence Variants Using HGVS Nomenclature. | Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA. | Bioinformatics. 2014 Sep 30. PubMed <http://www.ncbi.nlm.nih.gov/pubmed/25273102>__ | Open Access PDF <http://bioinformatics.oxfordjournals.org/content/31/2/268.full.pdf>__

Contributing @@@@@@@@@@@@

The hgvs package is intended to be a community project. Please see Contributing <http://hgvs.readthedocs.org/en/stable/contributing.html>__ to get started in submitting source code, tests, or documentation. Thanks for getting involved!

See Also @@@@@@@@

Other packages that manipulate HGVS variants:

  • pyhgvs <https://github.com/counsyl/hgvs>__
  • Mutalyzer <https://mutalyzer.nl/>__

.. _docs: http://hgvs.readthedocs.org/ .. _Variation Nomenclature: http://varnomen.hgvs.org/

.. |getting_help| image:: https://img.shields.io/badge/!-help%20me-red.svg :target: https://hgvs.readthedocs.io/en/stable/getting_help.html

.. |rtd| image:: https://img.shields.io/badge/docs-readthedocs-green.svg :target: http://hgvs.readthedocs.io/

.. |changelog| image:: https://img.shields.io/badge/docs-changelog-green.svg :target: https://hgvs.readthedocs.io/en/stable/changelog/

.. |github_license| image:: https://img.shields.io/github/license/biocommons/hgvs.svg :alt: GitHub license :target: https://github.com/biocommons/hgvs/blob/main/LICENSE)

.. |group| image:: https://img.shields.io/badge/group-hgvs%20discuss-green.svg :alt: Mailing list :target: https://groups.google.com/forum/#!forum/hgvs-discuss

.. |chat| image:: https://img.shields.io/badge/chat-gitter-green.svg :alt: Join the chat at https://gitter.im/biocommons/hgvs :target: https://gitter.im/biocommons/hgvs?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

.. |github_tag| image:: https://img.shields.io/github/tag/biocommons/hgvs.svg :alt: GitHub tag :target: https://github.com/biocommons/hgvs

.. |pypi_rel| image:: https://img.shields.io/pypi/v/hgvs.svg :target: https://pypi.org/project/hgvs/

.. |status_rel| image:: https://img.shields.io/travis/biocommons/hgvs/main.svg :target: https://travis-ci.org/biocommons/hgvs?branch=main

.. |coveralls| image:: https://img.shields.io/coveralls/github/biocommons/hgvs.svg :target: https://coveralls.io/github/biocommons/hgvs

.. |issues| image:: https://img.shields.io/github/issues-raw/biocommons/hgvs.svg :alt: issues :target: https://github.com/biocommons/hgvs/issues

.. |github_open_pr| image:: https://img.shields.io/github/issues-pr/biocommons/hgvs.svg :alt: GitHub Open Pull Requests :target: https://github.com/biocommons/hgvs/pull/

.. |github_stars| image:: https://img.shields.io/github/stars/biocommons/hgvs.svg?style=social&label=Stars :alt: GitHub stars :target: https://github.com/biocommons/hgvs/stargazers

.. |github_forks| image:: https://img.shields.io/github/forks/biocommons/hgvs.svg?style=social&label=Forks :alt: GitHub forks :target: https://github.com/biocommons/hgvs/network

.. |github_contrib| image:: https://img.shields.io/github/contributors/biocommons/hgvs.svg :alt: GitHub license :target: https://github.com/biocommons/hgvs/graphs/contributors/

.. |install_status| image:: https://travis-ci.org/reece/hgvs-integration-test.png?branch=main :target: https://travis-ci.org/reece/hgvs-integration-test

.. |binder| image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/biocommons/hgvs/main?filepath=examples

.. |hit| image:: https://travis-ci.org/biocommons/hgvs-installation-test.svg?branch=main :alt: nightly test of ability to pip install, import, and parse a variant :target: https://travis-ci.org/biocommons/hgvs-installation-test