Unipressed icon indicating copy to clipboard operation
Unipressed copied to clipboard

idmapping sleep/ wait

Open Dan-Burns opened this issue 3 years ago • 10 comments

Hello,

I'd like to use the idmappingclient but I have to get lucky to not end up with the error UniProt has not yet processed the results, consider using time.sleep() to wait until they are complete.

I have reduced the size of my query lists quite a bit but at some point in my loop, I will get the error. I have added time.sleep() before I call request.each_result() but I don't know how long to set it for to ensure I don't run into the error.

I see there is a "waiting" module but I'm not sure how to incorporate it into this so I can wait indefinitely for the results to be returned.

Thank you for this package.

Dan

Dan-Burns avatar Oct 20 '22 02:10 Dan-Burns

Okay so what I would do is submit the request, then run a loop where you check the status of the job until it finishes. e.g.

import time
from unipressed import IdMappingClient

request = IdMappingClient.submit(
    source="UniProtKB_AC-ID", dest="Gene_Name", ids={"A1L190", "A0JP26", "A0PK11"}
)
while True:
    status = request.get_status()
    if status in {"FINISHED", "ERROR"}:
        break
    else:
        sleep(5)

multimeric avatar Oct 20 '22 02:10 multimeric

Thank you - I didn't realize there was a get_status() method - should've looked harder.

I implemented that but I still got the same error. Although I might have looped through more request submissions before it happened this time.

Thank you, Dan

Dan-Burns avatar Oct 21 '22 12:10 Dan-Burns

Can you please post a reproducible example?

multimeric avatar Oct 21 '22 12:10 multimeric

The attached .zip is a json file containing a dictionary where the keys are integers and the values are sets of uniprot ids that I'm trying to get GI numbers for. This dictionary is referred to as "id_lists" in the loop and the "chunk" is the dictionary key. I loop through the dictionary keys to submit the subset of uniprot ids with the idmapping client with the included function get_gi_numbers() with:

uniprot_to_gi = {}
for chunk, id_list in id_lists.items():
    uniprot_to_gi[chunk] = get_gi_numbers(id_list, delay=5)

uniprot_ids.zip

def get_gi_numbers(uniprot_ids, delay=5):

    request = IdMappingClient.submit(
           source="UniProtKB_AC-ID", dest="GI_number", ids=uniprot_ids
        )
    
    while True:
        status = request.get_status()
        if status in {"FINISHED", "ERROR"}:
            break
        else:
            time.sleep(delay)
    
    return [i for i in request.each_result()]

Using this, I still get: IdMappingError: UniProt has not yet processed the results, consider using time.sleep() to wait until they are complete.

Thank you, Dan

Dan-Burns avatar Oct 21 '22 14:10 Dan-Burns

I can't easily reproduce this. The only way I could see this happening is if uniprot is actually returning an invalid result which tricks my code into thinking it hasn't finished. If you could narrow down the IDs (or possibly single ID) that causes this by catching the error unipressed throws, that would be great.

multimeric avatar Oct 21 '22 14:10 multimeric

I see, I was wondering if it might be a bad id.

I'm not sure if that is the case since I can make it through one set of ids on one attempt but on another attempt, it will fail on that same set of ids.

I will look into it.

Dan-Burns avatar Oct 21 '22 15:10 Dan-Burns

It doesn't seem like a single bad ID would make it fail, I just tried it and Uniprot just ignores invalid IDs, but otherwise behaves reasonably.

multimeric avatar Oct 22 '22 01:10 multimeric

My guess is you're hitting an intermittent issue with the uniprot API itself, so you would get this same issue with any client library (not just unipressed). However I would like to be able to smooth over that glitch in unipressed which is why I want to catch it.

multimeric avatar Oct 22 '22 01:10 multimeric

I had this "unstable return/connection/timeout" with this package due to the lack of exception handling. All three functions, submit, get_status, and each_result call can break individually, and it is not easy to catch all the possible exceptions. Finally, I came up with a solution without pagination ability. Hope this example helps.

from retry import retry
from unipressed import IdMappingClient
from unipressed.id_mapping.core import IdMappingError
from unipressed.id_mapping.core import IdMappingJob


@retry(IdMappingError, delay=2, tries=5)
def submit_query(gene_ids: str) -> IdMappingJob:
    """
    Query UniProt DB with a string of Gene ids
    Args:
        gene_ids: A string of NCBI Gene IDs separated by comma
    Returns:
    IdMappingJob Object
    """
    try:
        job_request = IdMappingClient.submit(
            source="GeneID", dest="UniProtKB", ids={gene_ids}
        )
        return job_request
    except:
        raise IdMappingError


@retry(ValueError, delay=2, tries=5)
def check_status(job_request: IdMappingJob) -> str:
    """
    Obtain job status
    Args:
        job_reuqest: an IdMappingJob Object
    Returns:
    FINISHED or FAILED
    """
    try:
        job_status = job_request.get_status()
        if job_status == "FINISHED":
            return job_status
        elif job_status == "RUNNING":
            raise ValueError()
    except:
        return "FAILED"


@retry(IdMappingError, delay=2, tries=25)
def get_results(job_request: IdMappingJob) -> list:
    """
    Retrives individual results
    Args:
        job_reuqest: an IdMappingJob Object
    Returns:
    A list of Id mapping results in the format of [{'from': '1', 'to': 'P04217'}, {'from': '1', 'to': 'V9HWD8'}
    """
    try:
        returned = list(job_request.each_result())
        return returned
    except:
        raise IdMappingError


def get_uniprot_ids_from_gene_ids(gene_ids: str) -> list[dict[str, str]]:
    """
    By using NCBI Gene IDs, this function maps to UniProt IDs. One NCBI Gene ID can be mapped to one or many.
    Args:
        gene_ids: A string of NCBI Gene IDs separated by comma
    Returns:
    A list of dictionaries, each dictionary consists of {'from': 'NCBI Gene ID', 'to': 'UniProt ID'}
    """
    job_request = returned = None
    results_parsed = None
    job_request = submit_query(gene_ids)
    if job_request is not None:
        jstatus = check_status(job_request)
        if jstatus != "FAILED":
            returned = get_results(job_request)
            if returned is not None:
                results_parsed = []
                for result in returned:
                    results_parsed.append(result)
    return results_parsed

yoonkihoon avatar Mar 31 '23 23:03 yoonkihoon

Hi @yoonkihoon. If there really is an intermittent issue with the uniprot API, then I think your @retry solution is a good one. Feel free to submit it as a PR.

multimeric avatar Apr 01 '23 06:04 multimeric