An SQL database is comprised of a bunch of **tables** (kind of like a spreadsheet) that have **fields** (column names) and **records** (rows of data). For example, our database might have a table called *students* that looks like this:

id | first | last | grade | GPA |
---|---|---|---|---|

1 | Al | Einstein | 6 | 2.7 |

2 | Annie | Cannon | 6 | 3.8 |

3 | Artie | Eddington | 8 | 3.2 |

4 | Carlie | Herschel | 8 | 3.2 |

So in our *students* table, the fields are [*id*, *first*, *last*, *grade*, *GPA*], and there are a total of four records, each with a required yet arbitrary *id* in the first column.

To pull these records out, we tell SQL to **SELECT** values for the following fields **FROM** a certain table. In SQL this looks like:

`In [1]: db.execute("`

**SELECT** id, first, last, grade, GPA **FROM** students").fetchall()

Out[1]: [(1,'Al','Einstein',6,2.7),(2,'Annie','Cannon',6,3.8),(3,'Artie','Eddington',8,3.2),(4,'Carlie','Herschel',8,3.2)]

Or equivalently, we can just use a wildcard "*" if we want to return all fields with the SQL query `"SELECT * FROM students"`

.

We can modify our SQL query to change the order of fields or only return certain ones as well. For example:

`In [2]: db.execute("SELECT last, first, GPA FROM students").fetchall()`

Out[1]: [('Einstein','Al',2.7),('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Now that we know how to get records from tables, we can restrict which records it returns with the **WHERE** statement:

`In [3]: db.execute("SELECT last, first, GPA FROM students `

**WHERE GPA>3.1**").fetchall()

Out[3]: [('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Notice the first student had a GPA less than 3.1 so he was omitted from the result.

Now let's say we have a second table called *quizzes* which is a table of every quiz grade for all students that looks like this:

id | student_id | quiz_number | score |
---|---|---|---|

1 | 1 | 3 | 89 |

2 | 2 | 3 | 96 |

3 | 3 | 3 | 94 |

4 | 4 | 3 | 92 |

5 | 1 | 4 | 78 |

6 | 3 | 4 | 88 |

7 | 4 | 4 | 91 |

Now if we want to see only Al's grades, we have to **JOIN** the tables **ON** some condition. In this case, we want to tell SQL that the **student_id** (not the id) in the *quizzes* table should match the **id** in the *students* table (since only those grades are Al's). This looks like:

`In [4]: db.execute("SELECT quizzes.quiz_number, quizzes.score FROM quizzes `

**JOIN students ON students.id=quizzes.student_id** WHERE students.last='Einstein'").fetchall()

Out[4]: [(3,89),(4,78)]

So `students.id=quizzes.student_id`

associates each quiz with a student from the *students* table and `students.last='Einstein'`

specifies that we only want the grades from the student with last name Einstein.

Similarly, we can see who scored 90 or greater on which quiz with:

`In [5]: db.execute("SELECT students.last, quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON students.id=quizzes.student_id WHERE quizzes.score>=90").fetchall()`

Out[5]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]

That's it! We can JOIN as many tables as we want with as many restrictions we need to pull out data in the desired form.

This is powerful, but the queries can become lengthy. A slight shortcut is to use the **AS** statement to assign a table to a variable (e.g. *students* => *s*, *quizzes* => *q*) like such:

`In [6]: db.execute("SELECT s.last, q.quiz_number, q.score FROM `

**quizzes AS q** JOIN **students AS s** ON s.id=q.student_id WHERE q.score>=90").fetchall()

Out[6]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]

Pingback: BDNYC Database Documentation | BDNYC