• SQL Joins: A Guide That Makes It Stupid – Simple


    Share This Post:

    Businesses across the globe have evolved over the past decade. With companies striving to Digitally Transform their value chains and become more Lean and Agile, Analysts and Program/Project/Product Managers of today need to become data-driven problem solvers. To solve problems using data, one needs to access, analyze and report on the business data. However, with the amount of data collected in current times, Excel spreadsheets are neither feasible nor sustainable for this purpose.

    This brings me to a solution that lets us access and manipulate large data sets. Yes, I am talking about the most commonly used data querying language – SQL (Structured Query Language).

    What is SQL?

    Per Mode Analytics, SQL is a programming language that is semantically easy to understand and learn and is used to access large amounts of data directly from the data source.

    “SQL is great for performing the types of aggregations that you might normally do in an Excel pivot table—sums, counts, minimums and maximums, etc.—but over much larger datasets and on multiple tables at the same time.”

    MODE ANALYTICS

    SQL becomes truly powerful when we join multiple data sources using certain Join conditions to answer our questions. So what are the different ways we can join data tables using SQL?

    Types of Joins in SQL

    Outer Joins

    1. Left Join

    We have seen numerous resources that explain a Left (Outer) Join with a Venn Diagram that looks somewhat like this:

    However, I believe that the above Venn diagram doesn’t exactly give clarity on this type of join. One might not be able to completely grasp the final outcome of such a join.

    In a Left Join, as shown above, in the Venn Diagram, all the records from Table A are present in the result set. Any record with no matching data in Table B has”Null” values in place of Table B values.

    Let’s consider the below example wherein there are 2 tables: EMPLOYEE and ADDRESS. Let’s say that the employee John Doe doesn’t have an address listed in the Address table. In addition to that, the employee Jill Mcfee has 2 separate addresses in the ADDRESS table. So when we “Left Join” the EMPLOYEE table with the ADDRESS table, the result table will look as demonstrated below.

    SELECT e.First_Name, e.Last_Name, a.City, a.State
    FROM EMPLOYEE e
    LEFT JOIN ADDRESS a
    ON a.Employee_ID = e.Employee_ID;

    NOTE: The result table above has a total of 4 rows instead of 3 because the employee record of Jill Mcfee was joined to 2 addresses on the ADDRESS table, resulting in 2 records.

    2. Right Join

    This type of join is depicted as below in Venn Diagram format:

    All the records from Table B are present in the result set. Any record with no matching data in Table A has “Null” values in place of Table A values.

    To give better clarity on the final result of the join function, I’ll use the example from Left Join. The below illustration shows how a Right Join works.

    SELECT e.First_Name, e.Last_Name, a.City, a.State
    FROM EMPLOYEE e
    RIGHT JOIN ADDRESS a
    ON a.Employee_ID = e.Employee_ID;

    3. Full Outer Join

    A Full Outer Join includes all records from both Left and Right Tables, with the missing values represented as Null values.

    We can better illustrate the Full Outer Join by using the example from the Left Join section instead of the Venn Diagram above. The illustration below shows a Full Outer Join between EMPLOYEE and ADDRESS tables.

    SELECT e.First_Name, e.Last_Name, a.City, a.State
    FROM EMPLOYEE e
    FULL OUTER JOIN ADDRESS a
    ON a.Employee_ID = e.Employee_ID;

    SELECT e.First_Name, e.Last_Name, a.City, a.State
    FROM EMPLOYEE e
    FULL JOIN ADDRESS a
    ON a.Employee_ID = e.Employee_ID;

    In this example, employee John Doe doesn’t have any address in the ADDRESS table, and therefore, has Null values in City and State columns in the result table. Similarly, there is no employee in the EMPLOYEE table for Address ID = 104 ad Employee ID = 4. Therefore, the values for the First Name and Last Name columns will be Null values in the result table for this record.

    Inner Join

    Inner Join represents the Intersection of 2 Tables. In other words, the result set contains only those records that have matching values in both tables.

    In our example, the inner join between tables EMPLOYEE and ADDRESS is illustrated below.

    We can build the query for an inner join in 2 ways:

    SELECT e.First_Name, e.Last_Name, a.City, a.State
    FROM EMPLOYEE e
    INNER JOIN ADDRESS a
    ON a.Employee_ID = e.Employee_ID;

    SELECT e.First_Name, e.Last_Name, a.City, a.State
    FROM EMPLOYEE e
    JOIN ADDRESS a
    ON a.Employee_ID = e.Employee_ID;

    A Concluding Note…

    This article talks about inner and outer joins, and then types of outer joins. While in this article, I discussed only those scenarios where we join 2 tables, we often need to join more than 2 tables. That’s when we truly start seeing the complexity increase.

    I such cases, the order of the joins becomes important.

    For instance, Table A LEFT JOIN Table B LEFT JOIN Table C is not the same as Table C LEFT JOIN Table A LEFT JOIN Table B

    More to come on the importance of order of joins. Stay tuned…