GISAIDR icon indicating copy to clipboard operation
GISAIDR copied to clipboard

How can I download the data from EpiFlu database?

Open virologist opened this issue 2 years ago • 8 comments

Hi, @Wytamma

How can I download the data (e.g., H1N1) from the GISAID EpiFlu database using GISAIDR?

Best, Yang

virologist avatar Sep 19 '22 15:09 virologist

Hi @virologist! Unfortunately GISAIDR doesn't support EpiFlu at this time. But I will add this to the version 2.0 milestone.

Wytamma avatar Sep 20 '22 08:09 Wytamma

Thanks for the great package! Seconding this request.

abuendia avatar Oct 06 '23 17:10 abuendia

@Wytamma, if you could explain the main limitations to not including EpiFlu yet, I might be able to help...

dmontecino avatar Oct 23 '23 15:10 dmontecino

Hey @dmontecino,

Thanks for the offer! Unfortunately EpiFlu uses a completely different interface to the other Epi platforms so you’d need to create all new methods just for EpiFlu.

I think supporting EpiFlu would be a good excuse to refactor GISAIDR. I will create a version 2 branch soon and start working on this update. If you’re still keen to help out I’d appreciate it! Always happy to accept PRs.

-W

Wytamma avatar Oct 26 '23 13:10 Wytamma

Does EpiFlu have public-facing API docs? I got a question about this today, and without a GISAID login I wasn't able to find any information about the EpiFlu interface (asides from some youtube videos demonstrating the web platform).

Related question: does implementing EpiFlu require reverse-engineering a public interface, or are there real docs available? In particular, I was wondering if "vanilla" wget and/or httr2 might provide a complimentary avenue to bulk access...

helmingstay avatar Nov 02 '23 20:11 helmingstay

Hey @helmingstay,

Unfortunately there's no API docs for GISAID :/ or any public API (this is why I made GISAIDR).

Yes implementing EpiFlu would require making the same HTTP requests the frontend makes to the backend. Another option would be to use a web driver like selenium. However, the web interface is restricted based on user access level and limited in the amount of data your can return. For example you were able to modify the nrows return using query params to get more data but GISAID removed this option (limiting query to 50 rows at a time). You are also unable to parallelise downloads as GISAID uses a stateful API meaning that you can only have one state per access token at a time.

There are many many issues with GISAID that make it difficult to use and interface with. It would be fantastic if they just rebuilt the site with a modern and open API, but looking at the track history of GISAID I doubt that will be any time soon.

Wytamma avatar Nov 03 '23 02:11 Wytamma

Hi All,

I'm working on epiflu in this brach -> https://github.com/Wytamma/GISAIDR/tree/EpiFlu.

It's currently limited to querying. Will hopefully add sequence download soon but you can do things like this:

> credentials <- login(username = username, password = password, database="EpiFlu")
> df <- epiflu_query(credentials, type = "A", h = 3, n = 2, from = "2023-12-01")
> df
        id selected                                               edit                    virus_name     accession_id subtype passage_details_history PB2 PB1  PA    HA    NP    NA    MP  NS  HE  P3
1 18629904    FALSE /epi3/app_entities/entities/help/pencil_noedit.png           A/Cardiff/5116/2023 EPI_ISL_18629904    H3N2                         --- --- --- 1,762 1,572 1,453 1,027 895 --- ---
2 18619994    FALSE /epi3/app_entities/entities/help/pencil_noedit.png      A/Kostroma/CRIE/193/2023 EPI_ISL_18619994    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
3 18619990    FALSE /epi3/app_entities/entities/help/pencil_noedit.png        A/Moscow/CRIE/176/2023 EPI_ISL_18619990    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
4 18619987    FALSE /epi3/app_entities/entities/help/pencil_noedit.png       A/Lipetsk/CRIE/172/2023 EPI_ISL_18619987    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
5 18619986    FALSE /epi3/app_entities/entities/help/pencil_noedit.png        A/Ryazan/CRIE/171/2023 EPI_ISL_18619986    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
6 18619985    FALSE /epi3/app_entities/entities/help/pencil_noedit.png A/Moscow oblast/CRIE/170/2023 EPI_ISL_18619985    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---

Would be grateful for any feed back or PRs to add additional features / tests / docs.

Wytamma avatar Dec 15 '23 06:12 Wytamma