Using astroquery

If you're like me, you've spent a lot of time downloading a table from VizieR, and then trying to insert it into whatever database or table you are currently using. It's annoying, particularly because VizieR either won't output information in a machine-readable format or won't put an empty line into a massive list of results when there's no match in the catalog.
Astroquery is better. It's a python package affiliated with astropy that can grab data from all your favorite catalog servers (Vizier, SIMBAD, IRSA, NED, etc) directly, and use it; combine that with Astropy's ability (or Python's capacity) to print out tables to files, and it's a lot easier to write a script to do it all for you.

Astroquery is in the Python Package Index, so you can install and update it with
pip install astroquery

You also need the 'requests' module to do the web-interfacing (and Astropy v3); it's available on the Python Package Index and comes preloaded in Anaconda. Using it is simple (this part is almost entirely taken from their pages on using VizieR and IRSA):

from astroquery.vizier import Vizier

result = Vizier.find_catalogs() takes a string, and then returns a list of catalogs that match the search string, just like typing them in the search box of the website.

result = Vizier.get_catalogs() takes a string (or list of strings), and will return a list of Astropy Tables that are the catalogs in question.

result = Vizier.query_object() takes a string, assuming it's a star (by a name SIMBAD would recognize, of course) and returns matching entries in all the tables it can find, in the form of a list of Astropy Tables

Querying based on coordinates requires the astropy.coordinates and astropy.units modules:
import astropy.coordinates as coords
import astropy.units as units

result = Vizier.query_region(coords.ICRS(ra=value, dec=value), radius= 2*units.seconds, catalog=(), columns=())
where catalog is either a string (or list of strings) that match the names of catalogs in the database, and columns (optional) are the names of columns within those catalogs. You'll need to play around on the website in order to figure out what they're called. For instance, ALLWISE is 'II/328'; VizieR catalogs can contain multiple VizieR-tables, but ALLWISE is just the one:

An example:
In [1]: import astropy.units as units
In [2]: import astropy.coordinates as coords
In [3]: from astroquery.vizier import Vizier
In [4]: result = Vizier.query_region(coords.ICRS(ra=092.220149, dec=-27.899542, unit = (units.deg,units.deg)), radius=2.0*units.arcmin, catalog=('II/328'))
In [5]: print result
TableList with 1 tables:
'0:II/328/allwise' with 28 column(s) and 50 row(s)


VizieR is (as on the website) limited by default to 50 rows of output, and the default column list. The solution is apparently to set Vizier.ROW_LIMIT to what you really want (or -1 for unlimited). UPDATE: Another important thing to (re)set is Vizier.TIMEOUT (or Irsa.TIMEOUT, or Simbad.TIMEOUT, etc etc), which defaults to 60 seconds. If you're querying on a large amount of data, you don't want it to crash six hours in...

In [6]: print result[0]
AllWISE RAJ2000 DEJ2000 Im W1mag e_W1mag W2mag e_W2mag ... var pmRA e_pmRA pmDE e_pmDE qph d2M _2M
------------------- ----------- ----------- --- ------ ------- ------ ------- ... ---- ---- ------ ----- ------ ---- ------ ---
J060859.12-275457.9 92.2463402 -27.9161025 Im 17.214 0.118 17.562 nan ... nnnn 100 1499 -682 1626 BUUU nan 2M
J060855.87-275544.6 92.2328228 -27.9290731 Im 17.598 0.164 17.004 0.330 ... nnnn 34 2185 2730 2454 BBUU nan 2M
J060858.24-275509.1 92.2426940 -27.9191946 Im 15.524 0.040 15.937 0.133 ... 0nnn -424 323 -410 356 ABUU 0.336 2M
J060855.27-275533.2 92.2302970 -27.9259147 Im 16.388 0.065 16.397 0.210 ... 0nnn 598 679 -611 744 ABUU nan 2M
J060847.53-275524.2 92.1980470 -27.9234112 Im 16.742 0.085 16.644 0.251 ... nnnn 367 1044 710 1148 ABCU nan 2M
J060851.23-275546.0 92.2134676 -27.9294489 Im 16.068 0.052 15.726 0.108 ... 1nnn -387 483 -467 529 AABU nan 2M
J060852.13-275554.0 92.2172369 -27.9316751 Im 17.751 0.170 16.880 0.310 ... nnnn 1953 2496 -488 2769 BBUU nan 2M
J060852.68-275431.0 92.2195394 -27.9086153 Im 15.552 0.040 15.582 0.103 ... 1nnn 538 321 -385 349 AAUU nan 2M
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
J060859.58-275345.8 92.2482578 -27.8960749 Im 16.863 0.085 17.198 0.375 ... nnnn -382 1097 -1064 1193 ACUU nan 2M
J060854.95-275332.9 92.2289671 -27.8924726 Im 16.926 0.094 16.933 nan ... nnnn -720 1184 -2086 1330 AUUU nan 2M
J060854.28-275318.6 92.2262044 -27.8885007 Im 17.025 0.101 16.342 0.182 ... nnnn 752 1284 -406 1445 ABUU nan 2M
J060853.42-275203.8 92.2225860 -27.8677390 Im 13.193 0.024 13.274 0.028 ... 11nn 24 69 -108 74 AAUC 0.034 2M
J060850.71-275237.1 92.2113004 -27.8769963 Im 15.153 0.034 14.956 0.062 ... 00nn -383 239 -219 263 AABU nan 2M
J060854.79-275246.6 92.2283235 -27.8796250 Im 15.449 0.038 15.247 0.081 ... 00nn 4 286 -138 316 AAUU nan 2M
J060851.88-275214.8 92.2161937 -27.8707796 Im 17.031 0.100 16.235 0.191 ... nnnn 844 1291 133 1422 ABUU nan 2M
In [7]: print result[0][0]['W1mag']
17.214


Note that the output is NOT sorted, so your closest match is NOT the best one, as it would be on the website! The actual W1mag of 2M0608 is 11.975: Change the radius to 2 arcsec to see:

In [8]: result = Vizier.query_region(coords.ICRS(ra=092.220149, dec=-27.899542, unit = (units.deg,units.deg)), radius=2.0*units.arcsec, catalog=('II/328'))
In [9]: print result
TableList with 1 tables:
'0:II/328/allwise' with 28 column(s) and 1 row(s)
In [10]: print result[0][0]['W1mag']
11.975

UPDATE: To get everything sorted by separation as on the website, you need to create a new Vizier query object as in their example:
vsearch = Vizier(columns=["+_r", "*"])
Where "_r" is the radial separation, the + before it specifies sorting in numerical order, and "*" is shorthand for all the other default columns (this is a way to change the columns queried, as well). So, for the query above to be sorted properly, you would then run:
result = vsearch.query_region(coords.ICRS(ra=092.220149, dec=-27.899542, unit = (units.deg,units.deg)), radius=2.0*units.arcmin, catalog=('II/328'))

Querying IRSA is similar. Differences include:

  • There's an Irsa.list_catalogs() function that lists all the catalogs IRSA currently serves, but doesn't seem to have a function that will download an entire catalog.
  • The identical IRSA catalog will have a different catalog name
  • IRSA catalogs will also have different column names.
  • IRSA catalogs do not have sub-tables, so your result won't either.

In [10]: from astroquery.irsa import Irsa

Irsa.list_catalogs() prints all the catalogs IRSA currently serves, if you want to know what they're called, but doesn't have any of the catalog download options.

In [11]: result = Irsa.query_region(coords.ICRS(ra=092.220149, dec=-27.899542, unit = (units.deg,units.deg)), radius=2.0*units.arcsec, catalog=('wise_allwise_p3as_psd'))
In [12]: print result
designation ra dec clon clat sigra sigdec sigradec ... w3nm w3m w4nm w4m dist angle id
------------------- ------- ------- ------------ ------------- ------ ------ -------- ... ---- --- ---- --- -------- --------- ---
J060852.84-275358.2 92.220 -27.900 06h08m52.84s -27d53m58.30s 0.0399 0.0369 -0.0034 ... 6 20 1 20 0.074213 51.270071 0

In [13]: print result[0]['w1mpro']
11.975

There are other modules for SIMBAD, UKIDSS, SDSS, and so on; and plenty of other options to search with conditions, and basically all the functionality of web queries, but I think these are good basics.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>