astroquery Simbad multi-object search behaviour

Currently if you query Simbad.query_objects (or Simbad.query_region) with many objects you get a table returned of a different size making comparison between input and output very difficult. This is because Simbad can return multiple rows for a single object (Centaurus for example) or no rows at all for an unrecognised object. The returned object id's aren't necessarily the input id's either so you can't use them to search the returned table. You can do it one at a time but for a few thousand objects it's quite slow.

Would it be possible to change this behaviour and the same for multiple coordinate searches? I'm not sure of the best way to handle this, to return the original search names/coordinates and/or the same number of rows with the same order could work. Even just a row label that identifies results as being in the same search group would be fine. A search by multiple coordinates (or a multiple region criteria) would benefit extremely from the second or third behaviour. As at the moment it is very difficult to distinguish which object belongs to each region search. The same behaviour for Ned would be even better :)

For Example:

rac=['144.696458 -60.09181', '203.426453 -65.99033']
Simbad.query_region(SkyCoord(rac, unit=u.deg),'1d')

Would return:

MAIN_ID	RA	DEC	OTYPE	GROUP
	h m s	d m s
object	unicode13	unicode13	object	int
---------------	-------------	-------------	------	------
IC 2501	9 38 47.146	-60 5 30.52	PN	1
TYC 9003-1531-1	13 33 42.8988	-65 59 11.376	Star	2
NGC 5189	13 33 32.86	-65 58 27.1	PN	2
TYC 9003-1874-1	13 33 27.265	-65 58 27.9	Star	2
TYC 9003-654-1	13 33 25.988	-66 0 14.359	Star	2

Multi object example:

from astroquery.eso import Eso
from astroquery.simbad import Simbad

#login info
eso = Eso()
login=raw_input('ESO Login: ')
eso.login(login)

#Set simbad
Sim=Simbad()
Sim.add_votable_fields('otype')
Sim.ROW_LIMIT=1e6
Sim.TIMEOUT=500

#Query
eso.ROW_LIMIT=1e12
table=eso.query_instrument('muse', night_flag=0, column_filters={'dp_type':'OBJECT'})
#Remove duplicate names
oname=list(set(table['Object']))
onar=Sim.query_objects(oname)
print len(onar['MAIN_ID']),len(oname)

Aug 08 '17 21:08 JLeftley

Somehow this slipped by me, but yes, this should be possible and maybe even straightforward. A PR implementing it would be welcome, otherwise maybe we can tackle this next time there's a hack session.

Sep 08 '17 21:09 keflavich

I have no solution for this to post yet but I'll put it here if I manage to make one before the next hack session :)

Sep 10 '17 19:09 JLeftley

I want to query many names in Simbad, to figure out which ones resolve and don't.

This is what I tried: https://gist.github.com/cdeil/ad1ffdd724878f4d72d25a117d92d5a5

It doesn't give me what I want, the issues I have are:

The number or rows in table plus number of entries in table.errors doesn't match the number of names I queried!?
And the result table doesn't contain the name I queried, i.e. I can't easily figure out which row corresponds to which query name?

Is there a way to do this currently with query_objects? Or do I have to run one query_object per object? Will SIMBAD block me if I run ~ 100 queries, possibly a few times?

Feb 19 '18 11:02 cdeil

@cdeil query_objects sends the list of names to SIMBAD in a single form. I believe what happened is:

Several entries (7) resulted in no match, but were recognized as valid names (I'm uncertain about this)
Several entries (another 7) somehow errored - perhaps they did not parse properly? I'm again not sure why
Both of the above categories are simply excluded from the results.

This is a good question for the CDS folks. I suggest e-mailing them directly to see if there's a way to get a table returned with blanks for missing fields or something similar.

Feb 20 '18 01:02 keflavich

Hi, for the question number 2 : if you are using scripts in SIMBAD, there is a way to get the names you gave : %OBJECT, for votable fields, you can use : TYPED_ID. SIMBAD blocks if you send more than 6 queries in the same second, and you can query with a list until 10000 names

Feb 20 '18 08:02 aoberto

The list of errors generated by this list of names are all here :

::error:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

[4] Identifier not found in the database : GAL 292.2-00.5 [5] 'PWN G292.15-0.54': No known catalog could be found [10] Identifier not found in the database : GAL 292.2-00.5 [11] Identifier not found in the database : GAL 318.2+00.1 [13] Identifier not found in the database : GAL 292.2-00.5 [14] 'PWN G292.15-0.54': No known catalog could be found [22] 'AX J150436-5824': this identifier has an incorrect format for catalog: AX : ASCA satellite, X-ray

[25] Identifier not found in the database : GAL 327.15-01.04 [26] Identifier not found in the database : GAL 327.1-01.1 [30] 'PWN G18.5-0.4': No known catalog could be found [33] Identifier not found in the database : GAL 018.6-00.2 [52] Identifier not found in the database : GAL 030.8-00.2 [61] Identifier not found in the database : GAL 033.2-00.6 [68] Identifier not found in the database : GAL 042.8+00.6

Feb 20 '18 08:02 aoberto

OK, so we should probably add the %OBJECT column in the query_objects query in astroquery.

Feb 20 '18 15:02 keflavich

Well, my memory is awful. This PR: https://github.com/astropy/astroquery/pull/496 addresses the issue of keeping the input name in the output results.

Feb 21 '18 21:02 keflavich