astroquery icon indicating copy to clipboard operation
astroquery copied to clipboard

Gaia.load_data increase Maximum number of source Ids.

Open jkrick opened this issue 2 years ago • 4 comments

I would like to use Gaia.load_data to grab epoch photometry for ~500,000 sources. I can successfully do this for a small number of sources, but run into a limit when I scale up. Error message is below. Is it possible to increase the allowed maximum number of source Ids? I'd vote for 500,000 as the limit just because it is my use case, but could see reasons to go higher. I don't know what the true limitations are. I'd also be open to other suggestions for how I should access the epochal photometry for these sources.

Background: I can get the 500,000 source id's using a table upload and a JOIN inside of a query to Gaia.launch_job_async. This is lightning fast and fantastic.

Here is my actual code:

## Some Definitions
 retrieval_type = 'EPOCH_PHOTOMETRY'
data_structure = 'INDIVIDUAL'   
data_release   = 'Gaia DR3'     

## Get the files
datalink = Gaia.load_data(ids=ids,
                          data_release = data_release,
                          retrieval_type=retrieval_type,
                          data_structure = data_structure, verbose = False, output_file = None , overwrite_output_file=True)
  

And the error message I am getting (on testing with fewer sources):

Maximum number of Source Ids reached (max: 5000, found: 22577)

Cannot process request: 'https://gea.esac.esa.int/data-server/data' (req: Reqid: anonymous1695744305998, retrieval access: DIRECT, retrieval type: EPOCH_PHOTOMETRY, compression: null), for user: UwsJobOwner{id='anonymous', name='null', mail='null', authUsername='null', authGroups=[], pseudo='anonymous', session='ABF99E52685E5059A90EECD530692465', ip='54.82.184.137', roles=0, parameters=Owner parameters: 6},  due to: Maximum number of Source Ids reached (max: 5000, found: 22577)

jkrick avatar Sep 26 '23 16:09 jkrick

cc @esdc-esac-esa-int

bsipocz avatar Sep 26 '23 16:09 bsipocz

full traceback:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Input In [26], in <cell line: 2>()
      1 gaiastarttime = time.time()
----> 2 df_lc_gaia = Gaia_get_lightcurve(coords_list,  labels_list , verbose = 1)
      4 #add the resulting dataframe to all other archives
      5 df_lc.append(df_lc_gaia)

File ~/fornax-demo-notebooks/light_curves/code/gaia_functions.py:51, in Gaia_get_lightcurve(coords_list, labels_list, verbose)
     45 print(gaia_table.columns)   
     47 ## Extract Light curves ===============
     48 # For each of the objects, request the EPOCH_PHOTOMETRY from the Gaia DataLink Service
     49 
     50 ## Run search
---> 51 prod_tab = Gaia_retrieve_EPOCH_PHOTOMETRY(ids=list(gaia_table["source_id"]) , verbose=verbose)
     53 ## Create light curves =================
     54 gaia_epoch_phot = Gaia_mk_lightcurves(prod_tab , verbose=verbose)

File ~/fornax-demo-notebooks/light_curves/code/gaia_functions.py:152, in Gaia_retrieve_EPOCH_PHOTOMETRY(ids, verbose)
    149 data_release   = 'Gaia DR3'     # Options are: 'Gaia DR3' (default), 'Gaia DR2'
    151 ## Get the files
--> 152 datalink = Gaia.load_data(ids=ids,
    153                           data_release = data_release,
    154                           retrieval_type=retrieval_type,
    155                           data_structure = data_structure, verbose = False, output_file = None , overwrite_output_file=True)
    156 dl_keys  = list(datalink.keys())
    158 if verbose > 2:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/gaia/core.py:288, in GaiaClass.load_data(self, ids, data_release, data_structure, retrieval_type, valid_data, band, avoid_datatype_check, format, output_file, overwrite_output_file, verbose)
    286     files = Gaia.__get_data_files(output_file=output_file, path=path)
    287 except Exception as err:
--> 288     raise err
    289 finally:
    290     if not output_file_specified:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/gaia/core.py:283, in GaiaClass.load_data(self, ids, data_release, data_structure, retrieval_type, valid_data, band, avoid_datatype_check, format, output_file, overwrite_output_file, verbose)
    280         log.error("Creation of the directory %s failed" % path)
    282 try:
--> 283     self.__gaiadata.load_data(params_dict=params_dict,
    284                               output_file=output_file,
    285                               verbose=verbose)
    286     files = Gaia.__get_data_files(output_file=output_file, path=path)
    287 except Exception as err:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/utils/tap/core.py:838, in TapPlus.load_data(self, params_dict, output_file, verbose)
    836 if verbose:
    837     print(response.status, response.reason)
--> 838 connHandler.check_launch_response_status(response,
    839                                          verbose,
    840                                          200)
    841 if verbose:
    842     print("Reading...")

File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/utils/tap/conn/tapconn.py:683, in TapConn.check_launch_response_status(self, response, debug, expected_response_status, raise_exception)
    681     errMsg = taputils.get_http_response_error(response)
    682     print(response.status, errMsg)
--> 683     raise requests.exceptions.HTTPError(errMsg)
    684 else:
    685     return isError

HTTPError: Error 500:
Maximum number of Source Ids reached (max: 5000, found: 22546)

Cannot process request: 'https://gea.esac.esa.int/data-server/data' (req: Reqid: anonymous1695752277686, retrieval access: DIRECT, retrieval type: EPOCH_PHOTOMETRY, compression: null), for user: UwsJobOwner{id='anonymous', name='null', mail='null', authUsername='null', authGroups=[], pseudo='anonymous', session='83924F676D97140F434C7324B3EB7EAF', ip='54.175.57.69', roles=0, parameters=Owner parameters: 6},  due to: Maximum number of Source Ids reached (max: 5000, found: 22546)

jkrick avatar Sep 26 '23 18:09 jkrick

You can follow the document https://www.cosmos.esa.int/web/gaia-users/archive/datalink-products#datalink_jntb_get_above_lim to downloand datalink products for >5000 sources

"dataLink server threshold: It is not possible to download products for more than 5000 sources in one single call." This limit cannot be changed by the users.

cosmoJFH avatar Sep 27 '23 14:09 cosmoJFH

Thanks @cosmoJFH for the very quick response. I've changed this issue to be a documentation one, to mention this datalink threshold of 5000 somewhere in our documentation, too (and maybe to cross link to the tutorial)

bsipocz avatar Sep 27 '23 16:09 bsipocz