Documentation using Sphinx

Sphinx is a free, online tool that helps generate documentation for primarily Python projects.  We use it for HTML format outputs, but it also supports LaTeX, ePub, Texinfo, manual pages, and plain text formats.  The markup language of Sphinx is reStructuredText (reST) (for documentation see  Of Sphinx's many features, the one most useful thus far is its auto-documentation of Python files, the steps of which this post details below.

Sphinx auto-documentation steps:

  1. Install Sphinx at the command line using: easy_install -U Sphinx
  2. Read through the Sphinx documentation at
  3. Run Sphinx's Getting Started function Sphinx-quickstart to quickly set up directories and configurations. NOTE: Since I have already completed this setup, it DOES NOT need to be run again for the BDNYC documentation.
    1. Access the documentation directories in Dropbox through the shared folder BDNYCdb/Documentation
    2. The Documentation folder contains the .rst and .py files that Sphinx uses to build documentation.  Place any files for which documentation is to be made within that directory since Sphinx automatically looks there.
      Screen Shot 2014-08-07 at 2.46.31 PMBDdb.html within the Documentation directories
  4. The index.rst file is the master file that contains the table of contents (known as a TOCtree) for the entire documentation.  Within  the TOCtree, Sphinx knows that all the files are .rst, so there is no need to specify the file extension or path.  The index also serves as a welcome page for users, so it may include an introduction or other information that would be useful for outside reference.  Within the TOCtree (and all of reST) indentation and notation is especially important.  Below is an example of a functioning TOCtree.  .. toctree::
    :maxdepth: 2
    # blank line

    The "maxdepth" function specifies the depth of nesting allowed.  In order to keep the index clean and aesthetically simple, a maximum of one nested page is reasonable. In other files, however, highly nested pages are not only common, but well-supported by reST.
    Screen Shot 2014-08-07 at 2.52.07 PM
    Screenshot of index.rst
  5. For each item in the TOCtree of the index file, there must be a corresponding .rst file in the main directory for Sphinx to nest within the index HTML file.  In the above example, a Database_instructions.rst file might contain in plain English how to interact with the database, while the BDdb.rst file contains the auto-documentation of
  6. The .rst file within the TOCtree of the index must be formatted like below to correctly call for the Python files. [name of module]
    ================    # make page header (see
    .. automodule:: [name of Python file w/o extension]
    :members:    # tells Sphinx to pull from the file
    # blank line

    Screen Shot 2014-08-07 at 2.57.41 PM

    Screenshot of BDdb.rst

  7. The referenced Python files for auto-documentation can either be in located in the main directory, or their location can be specified.  Since we will be constantly pushing and pulling from Github, it is best for those doing documentation to change the file path to their local copy of,, etc, before making documentation.  This can be achieved as follows:
  8. Open, which is the configuration file for Sphinx.  It was produced during the "quickstart" function to allow easy modifications of aesthetic features (i.e., theme, font), and facilitate the HTML building process.
    1. Change the file path in line 21 to the applicable local copy.
    2. Comment out the file path of previous users in order to make it easier for others to update documentation.
  9. In order to make the HTML files, move Terminal to the directory of the documentation, i.e., '.../Dropbox/BDNYCdb/Documentation' and run make html.
    1. The make htmlfunction comes build-in through running "quickstart" and uses the makefile in the main directory.  It can take other functions as well, including make clean, which deletes the entire contents of the "build" directory.
    2. Sphinx should put the generated HTML files in the /_build/html directory.
  10. To make the HTML files live on, use the application Cyberduck, which can be downloaded at
    1. Click "Open Connection" and put in the server, username, and password.
    2. Drag and drop any .rst and HTML files that were modified into the appropriate directory and the documentation should go live immediately!


References: -- Sphinx documentation homepage -- Outline of Sphinx documentation -- Using reStructuredText in Sphinx -- Auto-documentation documentation -- Download Cyberduck

Rotational and Vibrational Energy of Diatomic Molecules

Molecules can store energy in 4 different ways: translation (whole molecule moves in each of 3 Cartesian coordinates), rotation (whole molecule rotates—linear molecules can rotate around 2 axes, nonlinear around all 3 axes), vibration (bonds stretch and contract—3N-5 kinds of vibrations for linear molecules, 3N-6 for nonlinear), and electronic (ground state vs. excited). The energy absorbed in electronic transitions is usually in the UV-visible range, that for vibrations is in the IR range, for rotations in the microwave range, and the transition energy for translations is basically infinitesimal. This is because electronic transitions require the most energy, then vibrations, rotations and finally translations require the least. With high enough resolution, rotational transitions can be seen in vibrational spectra, with the rotational transitions giving structure to the vibrational absorption peaks.

Figure 1: An energy diagram showing the relative energies for rotational, vibrational, and electronic transitions and how they overlap.3

The energy of rotational and vibrational levels can be calculated separately and summed. Vibrations can be modeled using the simple harmonic oscillator—the potential energy of which comes from Hooke’s law. Rotational energy of a molecule can be calculated by assuming the molecule acts as a rigid rotor and solving the corresponding Schrodinger Equation. Both of these types of energy are quantized, which accounts for the vibrational (v) and rotational (J) quantum levels in the equation.

This equation will give energy levels in Joules. Vibrational spectroscopists generally use units of wavenumbers, or cm-1, though, which can be found by dividing the entire equation by hc.



  • v: vibration quantum number (0,1,2,…)
  • J: rotation quantum number (0,1,2,…)
  • k: spring constant/strength of bond
  • μ: reduced mass
  • I: moment of inertia, where is bond length
  • B: rotational constant

I began looking into this to explain why multiple carbon monoxide absorption peaks appeared where they did in my near-IR spectra at 2.29 and 2.32 microns or 4367 and 4310 cm-1. I was able to calculate the correct 2.29 μm peak by subtracting the energy where v=0 and J=8 from the energy where v=2 and J=10. This means that these peaks are due to the first vibrational overtone of CO, i.e. Δv=2 instead of 1. The peak at 2.32 and any other peaks around the same area can be found by going from different rotational energy states, 6 to 8, 4 to 6, etc. ΔJ for CO is usually ±2 because it is a linear molecule.

  1. McQuarrie, Donald A. Quantum Chemistry. 2nd ed. Sausalito: University Science Books, 2008.
  2. Atkins, P.; de Paula, J. Physical Chemistry. 8th ed. New York: W. H. Freeman and Company, 2006.
  3. “Electronic Spectra of Organic Molecules”. Organic Chemistry. Accessed July 24, 2014.

On solving the Rigid Rotor Schrodinger Equation:ödinger_Equation

Special thanks to Professor Andrew Crowther.


Adding Spectra to the BDNYC Data Archive

Once your BDNYC development environment is all setup, adding spectra to the database is relatively simple and all done through Terminal.

Open Terminal and then launch iPython from the command line with ipython --pylab.

Next, import the BDdb module and load the actual database with:

import BDdb
db = BDdb.get_db('/Users/your_computer/Dropbox/BDNYCdb/BDNYC.db')

Now you're all set to start adding spectra. There are two types of files that we typically use for spectra: ascii files and FITS files.

Adding ASCII Files

Ascii files are just text files with two or three columns of data, which should always be the wavelength, flux and (if three columns) the uncertainty values at each point. Adding this data is super simple:

db.add_ascii('/path/to/the/ascii/file', source_id)

So how do you find the source_id for your object? Just use the identify() function. For example, say my object is '2MASS J00361617+1821104'. I would search by the first four digits of the object name, in this case:


This may return a number of different sources, all with '0036' appearing somewhere in their name, so I would then visually inspect to find the source I'm looking for. In this case, the source_id is 86.

The file path and the source_id are the only required information to provide to add this type of data though the more columns you can complete the better! The values of the optional columns can just be entered in the function as arguments, e.g. telescope_id=6. The other columns include:

Argument Description
wavelength_units Units of the wavelength axis, e.g. 'um' for microns, 'A' for angstroms
flux_units Units of the flux axis, e.g. 'erg/s/cm2/A', 'W/m2/A', 'Jy', etc.
wavelength_order For high resolution data only, e.g. 58, 59, 60, etc.
publication_id The id from the PUBLICATIONS table that corresponds to the paper where the data was first published.
obs_date The dat of the observations as YYYY-MM-DD
instrument_id The id from the INSTRUMENTS table that corresponds to the instrument used to record the spectrum.
telescope_id The id from the TELESCOPES table that corresponds to the telescope used to record the spectrum.
airmass The airmass at the time the spectrum was recorded, which is any value between 0 and 2.
comment Comments about the spectrum that a future user of the data might need to know!

So as an example, if I want to add a NIR spectrum to the database taken with the SpeX Prism instrument on the NASA IRTF, my code should look like:

db.add_ascii('/Users/Joe/Desktop/NIR_spectra/2m0036_nir.txt', 86, telescope_id=7, instrument_id=6, airmass=1.0263, obs_date='2013-01-17)

Note: the order of the optional arguments does not matter!

How do you know the telescope_id, instrument_id, or publication_id? You can browse these tables in Terminal by running the table name as a method on the database instance or by searching the table with some keyword.

For example db.instruments() will show you every instrument_id and name. But db.instruments('SpeX') will return only those records that have 'spex' in the name.

The naming convention for publications is the first four letters of the first author's last name and then the two digit year. So I could try db.publications('Kirk01') to find Kirkpatrick et al. 2001 or just db.publications('Kirk') to see all Kirkpatrick first author publications.

Otherwise, just launch the SQLite Database Browser and load the appropriate table.

All the other values should be in the ascii filename (e.g. 2m0036_2013may02_58.txt might be the object name, obs_date and wavelength_order) or in the ascii file itself above the data columns. Sometimes you might just know the correct information and can add it where possible. Otherwise you can always just leave these blank.

Add FITS Files

Adding spectra in FITS files works exactly the same way but with a little more automation.

Just like adding an ascii file above, you simply do:

db.add_fits('/path/to/the/fits/file.fits', source_id)

The automation comes in since many FITS files have a metadata "header" with common keys like 'OBS_DATE', 'TELESCOP', 'INSTRUME', etc., which BDdb automatically tries to pull out and insert into our new spectrum record.

However, you can still fill in blanks and/or override the automation by simply adding in the optional arguments as we did above with db.add_ascii().

OOPS! Don't panic!

If anything gets entered incorrectly, remember that you can always just open up the SQLite Database Browser, navigate to the record you just added in the SPECTRA table, and double-click the cells you need to edit.

Or you can just select the row of your erroneous record and click the Delete Record button, and then try adding the spectrum again.

Use of the DISTINCT function in an SQLite query

When using an sqlite query to pull from our database, you may run into the issue where you only want to see the desired field entry once, and the resulting list or dictionary is giving it to you multiple times.

An Example

Say you want to get the source_id from the spectra table where the spectral_type is an L dwarf and the spectra regime is NIR. Meaning, you want to know which L dwarfs we have NIR spectra for. Well, the output will give you a list of all of the entries we have spectra for, even if some entries overlap on source_id. Each source_id will show up in the list per the number of separate NIR spectra we have for that object. If  you only want to see the source_id once, we can use a function tag called DISTINCT. It follows like:

SELECT DISTINCT sp.source_id FROM spectra AS sp JOIN spectral_types AS spt ON spt.source_id=sp.source_id WHERE spt.spectral_type BETWEEN 10 AND 19 AND sp.regime='NIR'

(See the joining tables post if confused on how to do so.)

*NOTE: Here I used another function BETWEEN instead of writing out spt.spectral_type>=10 AND spt.spectral_type<20

Using astroquery

If you're like me, you've spent a lot of time downloading a table from VizieR, and then trying to insert it into whatever database or table you are currently using. It's annoying, particularly because VizieR either won't output information in a machine-readable format or won't put an empty line into a massive list of results when there's no match in the catalog.
Astroquery is better. It's a python package affiliated with astropy that can grab data from all your favorite catalog servers (Vizier, SIMBAD, IRSA, NED, etc) directly, and use it; combine that with Astropy's ability (or Python's capacity) to print out tables to files, and it's a lot easier to write a script to do it all for you.

Astroquery is in the Python Package Index, so you can install and update it with
pip install astroquery

You also need the 'requests' module to do the web-interfacing (and Astropy v3); it's available on the Python Package Index and comes preloaded in Anaconda. Using it is simple (this part is almost entirely taken from their pages on using VizieR and IRSA):
Continue reading

Uncertainty Propagation

All observations have associated uncertainties which must be propagated through your analysis to a an uncertainty on a final result. The principle of uncertainty propagation is fairly simple:

\sigma_x^2 = \sigma_u^2\left(\frac{\partial x}{\partial u}\right)^2 + \sigma_v^2\left(\frac{\partial x}{\partial v}\right)^2+\dots +2\sigma_{uv}^2\left(\frac{\partial x}{\partial u}\right)\left(\frac{\partial x}{\partial v}\right)+\dots

The first two terms on the right are the averages of the squares of the deviations in x produced by the uncertainties in observables u and v respectively. The third term on the right is the average of the cross terms, which cancel out if u and v are uncorrelated. Thus in most situations, a reasonable approximation is:

\sigma_x^2 = \sum\limits_u\sigma_u^2\left(\frac{\partial x}{\partial u}\right)^2


A typical situation is the sum of two observables each with a multiplicative factor:


\left(\frac{\partial x}{\partial u}\right)=a,\hspace{10pt}\left(\frac{\partial x}{\partial v}\right)=b

\sigma_x = \sqrt{\sigma_u^2a^2 + \sigma_v^2b^2}

which is the oft used "summing in quadrature."

Perhaps you have the product of two observables:


\left(\frac{\partial x}{\partial u}\right)=av,\hspace{10pt}\left(\frac{\partial x}{\partial v}\right)=au

\sigma_x = \sqrt{\sigma_u^2(av)^2 + \sigma_v^2(au)^2}=a\sqrt{\sigma_u^2v^2 + \sigma_v^2u^2}

Finally, perhaps you have the inverse of one observable times the exponential of another:


\left(\frac{\partial x}{\partial u}\right)=\left(-\frac{a}{u^2}\right)e^{bv},\hspace{10pt}\left(\frac{\partial x}{\partial v}\right)=\frac{a}{u}be^{bv}

\sigma_x = \sqrt{\sigma_u^2\left(\frac{a}{u^2}e^{bv}\right)^2 + \sigma_v^2\left(\frac{a}{u}be^{bv}\right)^2}=\frac{a}{u}e^{bv}\sqrt{\frac{\sigma_u^2}{u^2}+\sigma_v^2b^2}


Setting Up Your BDNYC Astro Python Environment

Here's a step-by-step tutorial on how to set up your Python development environment.

Distribution and Compiler

The first step is to install Anaconda. Click here, select the appropriate version for your machine and operating system, download the .dmg file, open it and double-click the installer.

Make sure that when the installer asks where to put the distribution, you choose your actual hard drive and not some sub-directory on your machine. This will make things much easier later on.

What's nice about this distribution is that it includes common plotting, computing, and astronomy packages such as numpy and scipy, astropy, matplotlib, and many others.

Next you'll need a C compiler. If you're using MacOSX, I recommend using Xcode which you can download free here. (This step may take a while so go get a beverage.) Once the download is complete, run the installer.

Required Packages

For future installation of packages, I recommend using Pip. To install this, at your Terminal command line type sudo easy_install pip. Then whenever you want to install a package you might need, you just open Terminal and do pip install package_name.

Not every package is this easy (though most are). If you can't get something through Pip just download, unzip and put the folder of your new module with the rest of your packages in the directory /anaconda/pkgs/.

Then in Terminal, navigate to that directory with cd /anaconda/pkgs/package_dir_name and do python install.

Development Tools

MacOSX comes with the text editing application TextEdit but it is not good for editing code. I strongly recommend using TextMate though it is not free so you should ask your advisor to buy a license for you! Otherwise, some folks find the free TextWrangler to be pretty good.

Next, you'll want to get access to the BDNYC database. Detailed instructions are here on how to setup Dropbox and Github on your machine in order to interact with the database.

Launching Python

Now to use Python, in Terminal just type ipython --pylab. I recommend always launching Python this way to have the plotting library preloaded.


BDNYC Database Setup

Read this post first if your development environment needs setting up. Then...

Setting Up the Database

  1. Get someone who has access to the database to share the BDNYCdb Dropbox folder with you.
  2. Go to the BDNYC Github page and click on the Clone in Desktop button in the lower right. Then navigate to the desired directory and click Clone.
  3. Open a Terminal window and type open ~/.bash_profile (or open ~/.tcshrc or where ever your PYTHONPATH variable is) to open your profile in a text editor. Then append the path of your BDNYCdb directory to your PYTHONPATH with a colon (:) and save it.

    In .bash_profile it looks like (for example):


  4. From Terminal, reload your profile with source ~/.bash_profile or the like.

Now the SQLite database is in the shared Dropbox directory as BDNYC.db and the module to access it is in your directory cloned from Github. This way, the module can be version controlled and the database is updated automatically.

Accessing the Database

Now you can load the entire database into a Python variable simply by launching the Python interpreter and pointing the get_db() function to the database file by doing:

In [1]: import BDdb
In [2]: db = BDdb.get_db('/path/to/your/Dropbox/BDNYCdb/BDNYC.db')

Voila! To see an inventory of all database sources, just do:

In [3]: db.inventory()

You can see details on a specific source by entering the id from the inventory list. This will also plot all available spectra for that source for visual inspection.

In [4]: db.inventory(ID=767)

Now that you have the database at your fingertips, you’ll want to get some info out of it. To do this, you can use SQL queries.

Here is a detailed post about how to write an SQL query.

Further documentation for sqlite3 can be found here. Most commands involve wrapping SQL language inside python functions. The main (really only) one we will use to fetch data from the database is the execute() command:

In [5]: data = db.query.execute( "SQL_query_goes_here" ).fetchall()

Example Queries

Some SQL query examples to put in the command above (wrapped in quotes of course):

  1. SELECT shortname, ra, dec FROM sources WHERE (222<ra AND ra<232) AND (5<dec AND dec<15)
  2. SELECT band, magnitude, magnitude_unc FROM photometry WHERE source_id=58
  3. SELECT source_id, band, magnitude FROM photometry WHERE band='z' AND magnitude<15
  4. SELECT wavelength, flux, unc FROM spectra WHERE observation_id=75”

As you hopefully gathered:

  1. Returns the shortname, ra and dec of all objects in a 10 square degree patch of sky centered at RA = 227, DEC = 10
  2. Returns all the photometry and uncertainties available for object 58
  3. Returns all objects and z magnitudes with z less than 15
  4. Returns the wavelength, flux and uncertainty arrays for all spectra of object 75

The above examples are for querying individual tables only. We can query from multiple tables at the same time with the JOIN command like so:

  1. SELECT,, p.magnitude, p.magnitude_unc FROM telescopes as t JOIN photometry AS p ON WHERE p.source_id=58
  2. SELECT p1.magnitude-p2.magnitude FROM photometry AS p1 JOIN photometry AS p2 ON p1.source_id=p2.source_id WHERE'J' AND'H'
  3. SELECT src.designation, src.unum, spt.spectral_type FROM sources AS src JOIN spectral_types AS spt ON WHERE spt.spectral_type>=10 AND spt.spectral_type<20 AND spt.regime='optical'
  4. SELECT s.unum, p.parallax, p.parallax_unc, p.publication_id FROM sources as s JOIN parallaxes AS p ON

As you may have gathered:

  1. Returns the survey, band and magnitude for all photometry of source 58
  2. Returns the J-H color for every object
  3. Returns the designation, U-number and optical spectral type for all L dwarfs
  4. Returns the parallax measurements and publications for all sources

Alternative Output

As shown above, the result of a SQL query is typically a list of tuples where we can use the indices to print the values. For example, this source's g-band magnitude:

In [9]: data = db.query.execute("SELECT band,magnitude FROM photometry WHERE source_id=58").fetchall()
In [10]: data
Out[10]: [('u', 25.70623),('g', 25.54734),('r', 23.514),('i', 21.20863),('z', 18.0104)]
In [11]: data[1][1]
Out[11]: 25.54734

However we can also query the database a little bit differently so that the fields and records are returned as a dictionary. Instead of db.query.execute() we can do db.dict.execute() like this:

In [12]: data = db.dict.execute("SELECT * FROM photometry WHERE source_id=58").fetchall()
In [13]: data
Out[13]: [<sqlite3.Row at 0x107798450>, <sqlite3.Row at 0x107798410>, <sqlite3.Row at 0x107798430>, <sqlite3.Row at 0x1077983d0>, <sqlite3.Row at 0x1077982f0>]
In [14]: data[1]['magnitude']
Out[14]: 25.54734

Database Schema and Browsing

In order to write the SQL queries above you of course need to know what the names of the tables and fields in the database are. One way to do this is:

In [15]: db.query.execute("SELECT sql FROM sqlite_master").fetchall()

This will print a list of each table, the possible fields, and the data type (e.g. TEXT, INTEGER, ARRAY) for that field.

SQL browserEven easier is to use the SQLite Database Browser pictured at left which lets you expand and collapse each table.

It even allows you to manually create/edit/destroy records with a very nice GUI.

IMPORTANT: Keep in mind that if you change a database record, you immediately change it for everyone since we share the same database file on Dropbox. Be careful!

Always check and double-check that you are entering the correct data for the correct source before you save any changes with the SQLite Database Browser.

SQL Queries

An SQL database is comprised of a bunch of tables (kind of like a spreadsheet) that have fields (column names) and records (rows of data). For example, our database might have a table called students that looks like this:

id first last grade GPA
1 Al Einstein 6 2.7
2 Annie Cannon 6 3.8
3 Artie Eddington 8 3.2
4 Carlie Herschel 8 3.2

So in our students table, the fields are [id, first, last, grade, GPA], and there are a total of four records, each with a required yet arbitrary id in the first column.

To pull these records out, we tell SQL to SELECT values for the following fields FROM a certain table. In SQL this looks like:

In [1]: db.execute("SELECT id, first, last, grade, GPA FROM students").fetchall()
Out[1]: [(1,'Al','Einstein',6,2.7),(2,'Annie','Cannon',6,3.8),(3,'Artie','Eddington',8,3.2),(4,'Carlie','Herschel',8,3.2)]

Or equivalently, we can just use a wildcard "*" if we want to return all fields with the SQL query "SELECT * FROM students".

We can modify our SQL query to change the order of fields or only return certain ones as well. For example:

In [2]: db.execute("SELECT last, first, GPA FROM students").fetchall()
Out[1]: [('Einstein','Al',2.7),('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Now that we know how to get records from tables, we can restrict which records it returns with the WHERE statement:

In [3]: db.execute("SELECT last, first, GPA FROM students WHERE GPA>3.1").fetchall()
Out[3]: [('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Notice the first student had a GPA less than 3.1 so he was omitted from the result.

Now let's say we have a second table called quizzes which is a table of every quiz grade for all students that looks like this:

id student_id quiz_number score
1 1 3 89
2 2 3 96
3 3 3 94
4 4 3 92
5 1 4 78
6 3 4 88
7 4 4 91

Now if we want to see only Al's grades, we have to JOIN the tables ON some condition. In this case, we want to tell SQL that the student_id (not the id) in the quizzes table should match the id in the students table (since only those grades are Al's). This looks like:

In [4]: db.execute("SELECT quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON WHERE students.last='Einstein'").fetchall()
Out[4]: [(3,89),(4,78)]

So associates each quiz with a student from the students table and students.last='Einstein' specifies that we only want the grades from the student with last name Einstein.

Similarly, we can see who scored 90 or greater on which quiz with:

In [5]: db.execute("SELECT students.last, quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON WHERE quizzes.score>=90").fetchall()
Out[5]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]

That's it! We can JOIN as many tables as we want with as many restrictions we need to pull out data in the desired form.

This is powerful, but the queries can become lengthy. A slight shortcut is to use the AS statement to assign a table to a variable (e.g. students => s, quizzes => q) like such:

In [6]: db.execute("SELECT s.last, q.quiz_number, q.score FROM quizzes AS q JOIN students AS s ON WHERE q.score>=90").fetchall()
Out[6]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]