Gaia.load_data increase Maximum number of source Ids.
I would like to use Gaia.load_data to grab epoch photometry for ~500,000 sources. I can successfully do this for a small number of sources, but run into a limit when I scale up. Error message is below. Is it possible to increase the allowed maximum number of source Ids? I'd vote for 500,000 as the limit just because it is my use case, but could see reasons to go higher. I don't know what the true limitations are. I'd also be open to other suggestions for how I should access the epochal photometry for these sources.
Background: I can get the 500,000 source id's using a table upload and a JOIN inside of a query to Gaia.launch_job_async. This is lightning fast and fantastic.
Here is my actual code:
## Some Definitions
retrieval_type = 'EPOCH_PHOTOMETRY'
data_structure = 'INDIVIDUAL'
data_release = 'Gaia DR3'
## Get the files
datalink = Gaia.load_data(ids=ids,
data_release = data_release,
retrieval_type=retrieval_type,
data_structure = data_structure, verbose = False, output_file = None , overwrite_output_file=True)
And the error message I am getting (on testing with fewer sources):
Maximum number of Source Ids reached (max: 5000, found: 22577)
Cannot process request: 'https://gea.esac.esa.int/data-server/data' (req: Reqid: anonymous1695744305998, retrieval access: DIRECT, retrieval type: EPOCH_PHOTOMETRY, compression: null), for user: UwsJobOwner{id='anonymous', name='null', mail='null', authUsername='null', authGroups=[], pseudo='anonymous', session='ABF99E52685E5059A90EECD530692465', ip='54.82.184.137', roles=0, parameters=Owner parameters: 6}, due to: Maximum number of Source Ids reached (max: 5000, found: 22577)
cc @esdc-esac-esa-int
full traceback:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
Input In [26], in <cell line: 2>()
1 gaiastarttime = time.time()
----> 2 df_lc_gaia = Gaia_get_lightcurve(coords_list, labels_list , verbose = 1)
4 #add the resulting dataframe to all other archives
5 df_lc.append(df_lc_gaia)
File ~/fornax-demo-notebooks/light_curves/code/gaia_functions.py:51, in Gaia_get_lightcurve(coords_list, labels_list, verbose)
45 print(gaia_table.columns)
47 ## Extract Light curves ===============
48 # For each of the objects, request the EPOCH_PHOTOMETRY from the Gaia DataLink Service
49
50 ## Run search
---> 51 prod_tab = Gaia_retrieve_EPOCH_PHOTOMETRY(ids=list(gaia_table["source_id"]) , verbose=verbose)
53 ## Create light curves =================
54 gaia_epoch_phot = Gaia_mk_lightcurves(prod_tab , verbose=verbose)
File ~/fornax-demo-notebooks/light_curves/code/gaia_functions.py:152, in Gaia_retrieve_EPOCH_PHOTOMETRY(ids, verbose)
149 data_release = 'Gaia DR3' # Options are: 'Gaia DR3' (default), 'Gaia DR2'
151 ## Get the files
--> 152 datalink = Gaia.load_data(ids=ids,
153 data_release = data_release,
154 retrieval_type=retrieval_type,
155 data_structure = data_structure, verbose = False, output_file = None , overwrite_output_file=True)
156 dl_keys = list(datalink.keys())
158 if verbose > 2:
File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/gaia/core.py:288, in GaiaClass.load_data(self, ids, data_release, data_structure, retrieval_type, valid_data, band, avoid_datatype_check, format, output_file, overwrite_output_file, verbose)
286 files = Gaia.__get_data_files(output_file=output_file, path=path)
287 except Exception as err:
--> 288 raise err
289 finally:
290 if not output_file_specified:
File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/gaia/core.py:283, in GaiaClass.load_data(self, ids, data_release, data_structure, retrieval_type, valid_data, band, avoid_datatype_check, format, output_file, overwrite_output_file, verbose)
280 log.error("Creation of the directory %s failed" % path)
282 try:
--> 283 self.__gaiadata.load_data(params_dict=params_dict,
284 output_file=output_file,
285 verbose=verbose)
286 files = Gaia.__get_data_files(output_file=output_file, path=path)
287 except Exception as err:
File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/utils/tap/core.py:838, in TapPlus.load_data(self, params_dict, output_file, verbose)
836 if verbose:
837 print(response.status, response.reason)
--> 838 connHandler.check_launch_response_status(response,
839 verbose,
840 200)
841 if verbose:
842 print("Reading...")
File /srv/conda/envs/notebook/lib/python3.9/site-packages/astroquery/utils/tap/conn/tapconn.py:683, in TapConn.check_launch_response_status(self, response, debug, expected_response_status, raise_exception)
681 errMsg = taputils.get_http_response_error(response)
682 print(response.status, errMsg)
--> 683 raise requests.exceptions.HTTPError(errMsg)
684 else:
685 return isError
HTTPError: Error 500:
Maximum number of Source Ids reached (max: 5000, found: 22546)
Cannot process request: 'https://gea.esac.esa.int/data-server/data' (req: Reqid: anonymous1695752277686, retrieval access: DIRECT, retrieval type: EPOCH_PHOTOMETRY, compression: null), for user: UwsJobOwner{id='anonymous', name='null', mail='null', authUsername='null', authGroups=[], pseudo='anonymous', session='83924F676D97140F434C7324B3EB7EAF', ip='54.175.57.69', roles=0, parameters=Owner parameters: 6}, due to: Maximum number of Source Ids reached (max: 5000, found: 22546)
You can follow the document https://www.cosmos.esa.int/web/gaia-users/archive/datalink-products#datalink_jntb_get_above_lim to downloand datalink products for >5000 sources
"dataLink server threshold: It is not possible to download products for more than 5000 sources in one single call." This limit cannot be changed by the users.
Thanks @cosmoJFH for the very quick response. I've changed this issue to be a documentation one, to mention this datalink threshold of 5000 somewhere in our documentation, too (and maybe to cross link to the tutorial)