Results 31 issues of Yao Yao

Related Issue: https://github.com/biothings/myvariant.info/issues/103 Fix to https://github.com/biothings/myvariant.info/issues/103 is to count the documents in post-index steps and write the stats into ES mapping metadata as well as into the build docs in...

It's the very first version, and later migrated to [biothings.api - biothings/utils/jsondiff.py](https://github.com/biothings/biothings.api/blob/master/biothings/utils/jsondiff.py) (with further modification). P.S. See [Migration Commit](https://github.com/biothings/biothings.api/commit/faca40dd7456da0c66928eba130ed48905b00128) It's not used within MyVariant.

When running with `es.biothings.io:9200`, `test_others.py` will fail on ```txt query?q=_exists_:gnomad_genome&size=0&assembly=hg38 ``` The root cause is ES search timeout. The original configuration is in `config_web.py`: ```python # ES_HOST = 'es.biothings.io:9200' ES_ARGS...

Looks like the structure of their website has changed and the current dumper fails to detect the latest download url. Errors are found as below: ```python Traceback (most recent call...

There are some long HGVS IDs such as: ```text # rs796753453 chr22:g.49000999_49001000TGA[2]TGGTGA[2]TGA[2]TGG[3]TGATGGTGATGATGGTGGTGATGATGGTGATGGTGATGATGGTGGTGGTGATGGTGATGGTGGTGACGATGGTGATGGTGAGGATGATGGTGGTGGTGATGGTGAAGGTAATGGTGGTGGTGATGGTGATGGTGATAATGTGGCGATGGTGATGGTAATGATGGTGGTGATAGTGATGATGATGGTGATGGTGGTGGTGGTGATGGTGATGTTAATGATGGTGGTGATAGTGATGACGGTGATGGTGGCGGTGGCGGTGATGGCGATGGTGATGGTAATAATGGTGGTGATGGTGACGGTGATGATGGTGATGATGATGATGAGGGTGATGGTGATGATGTTGGTGATGGTGATGGTGGTGATGATAATGATGGTGGTGATGGTGGTGGTGATGGTGATGATGGTGGTGATGGTGTTGATGGTGATCTTGGTAATGGTGATGATGGTGGTGGTGATGGCAATGGTGATGATGGTGGTGATGGTGTTGATGGTGATCATG[1] ``` that can escape from the regex in [`trim_delseq_from_hgvs()`](https://github.com/biothings/myvariant.info/blob/master/src/utils/hgvs.py#L219) function, thus not trimmed by [`MyVariantBasicStorage`](https://github.com/biothings/myvariant.info/blob/master/src/hub/dataload/storage.py#L7). Such no long...

There are a few VCF normalization tools/algorithms that can be used in our [`_normalized_vcf`](https://github.com/biothings/myvariant.info/blob/master/src/utils/hgvs.py#L51) function, e.g.: - [bcftools](https://janis.readthedocs.io/en/latest/tools/bioinformatics/bcftools/bcftoolsnorm.html) - [GATK](https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format) - [SMaSH](http://smash.cs.berkeley.edu/normalize.html) - [vt](https://academic.oup.com/bioinformatics/article/31/13/2202/196142) - [Ref Python Implementation](https://github.com/quinlan-lab/vcftidy/blob/master/vcftidy.py#L382) Our current...

There are 5 _**variant types**_ (or `vartype`'s in our code), in dbSNP datasource, a.k.a. _**SNP classes**_ as listed in [Searching dbSNP in Entrez](https://www.ncbi.nlm.nih.gov/snp/docs/entrez_help/): > "del", "delins", "ins", "mnv", and "snv"....

## Symptom E.g. http://myvariant.info/v1/variant/chrY:g.100005T%3EG?fields=dbsnp returns ```python { "_id": "chrY:g.100005T>G", "_version": 1, "dbsnp": { ... "chrom": "X", ... } } ``` where the `chrom` field is inconsistent with the chromosome in...

Datasources in myvariant.io may contain large files to download (e.g. `dbsnp` release 155 has 380GB). Due to various reasons (like FTP connection issues), the download may be incomplete, leading errors...

enhancement

The ideal implementation would involve a message queue: 1. Any uploader extending `SnpeffPostUpdateUploader` will simply put the new IDs into the message queue in the `post_update_data` step, then return. 2....

enhancement