-
Learn How To Write SQL Queries With Examples: #6
Data are becoming the new raw material of business.
Craig Mundie (President, Mundie & Associates | Former Senior Advisor to the CEO, Microsoft)Question Source: LeetCode
Solution Language: MySQL
This Q&A series will cover data questions from LeetCode and present my solutions to them. Please feel free to comment with your suggestions if you feel that these problems may be solved in a more optimized manner.
Question (LeetCode Question #1270, Level: Medium)
Table:
Employees
+---------------+---------+ | Column Name | Type | +---------------+---------+ | employee_id | int | | employee_name | varchar | | manager_id | int | +---------------+---------+ employee_id is the primary key for this table. Each row of this table indicates that the employee with ID employee_id and name employee_name reports his work to his/her direct manager with manager_id The head of the company is the employee with employee_id = 1.
Write an SQL query to find
employee_id
of all employees that directly or indirectly report their work to the head of the company.The indirect relation between managers will not exceed 3 managers as the company is small.
Return the result table in any order without duplicates.
The query result format is in the following example:
Employees
table: +-------------+---------------+------------+ | employee_id | employee_name | manager_id | +-------------+---------------+------------+ | 1 | Boss | 1 | | 3 | Alice | 3 | | 2 | Bob | 1 | | 4 | Daniel | 2 | | 7 | Luis | 4 | | 8 | Jhon | 3 | | 9 | Angela | 8 | | 77 | Robert | 1 | +-------------+---------------+------------+Result
table: +-------------+ | employee_id | +-------------+ | 2 | | 77 | | 4 | | 7 | +-------------+ The head of the company is the employee with employee_id 1. The employees with employee_id 2 and 77 report their work directly to the head of the company. The employee with employee_id 4 report his work indirectly to the head of the company 4 --> 2 --> 1. The employee with employee_id 7 report his work indirectly to the head of the company 7 --> 4 --> 2 --> 1. The employees with employee_id 3, 8 and 9 don't report their work to head of company directly or indirectly.Solution
This solution uses the Set operation “UNION ALL” instead of “UNION” because we do not anticipate any duplicates in the final result set. Here is my reasoning:
- The only way that there could be duplicates in the final result set is if an employee reports to herself directly (employee_id = manager_id) as well as to employee_id 1 indirectly.
- This can happen only if there is a data quality issue
- To deal with this situation, I have added a condition in WHERE clause of every CTE subqueries to verify that for ny record, manager_id is not equal to employee_id
This approach is good for performance as well because we are filtering out any instances that could have led to duplicates, and then using “UNION ALL”, which gives us a higher-performing query as compared to one using the “UNION” set operation.
-- Direct Reports of Head of the company WITH dr AS (select employee_id FROM Employees WHERE manager_id = 1 AND employee_id <>1), -- Indirect Reports of Head of the company (Levels 1 to 3) ir1 AS (SELECT e.employee_id FROM Employees e JOIN dr ON e.manager_id = dr.employee_id AND e.manager_id <> e.employee_id), ir2 AS (SELECT e.employee_id FROM Employees e JOIN ir1 ON e.manager_id = ir1.employee_id AND e.manager_id <> e.employee_id), ir3 AS (SELECT e.employee_id FROM Employees e JOIN ir2 ON e.manager_id = ir2.employee_id AND e.manager_id <> e.employee_id) SELECT employee_id FROM dr UNION ALL SELECT employee_id FROM ir1 UNION ALL SELECT employee_id FROM ir2 UNION ALL SELECT employee_id FROM ir3
-
Learn How To Write SQL Queries With Examples: #5
Information is the oil of the 21st century, and analytics is the combustion engine.
Peter Sondergaard (Former EVP, Research & Advisory – Gartner)Question Source: LeetCode
Solution Language: MySQL
This Q&A series will cover data questions from LeetCode and present my solutions to them. Please feel free to comment with your suggestions if you feel that these problems may be solved in a more optimized manner.
Question (LeetCode Question #1412, Level: Hard)
Table:
Student
+---------------------+---------+ | Column Name | Type | +---------------------+---------+ | student_id | int | | student_name | varchar | +---------------------+---------+ student_id is the primary key for this table. student_name is the name of the student.
Table:
Exam
+---------------+---------+ | Column Name | Type | +---------------+---------+ | exam_id | int | | student_id | int | | score | int | +---------------+---------+ (exam_id, student_id) is the primary key for this table. Student with student_id got score points in exam with id exam_id.
A “quiet” student is the one who took at least one exam and didn’t score neither the high score nor the low score.
Write an SQL query to report the students (student_id, student_name) being “quiet” in ALL exams.
Don’t return the student who has never taken any exam. Return the result table ordered by student_id.
The query result format is in the following example.
Student table: +-------------+---------------+ | student_id | student_name | +-------------+---------------+ | 1 | Daniel | | 2 | Jade | | 3 | Stella | | 4 | Jonathan | | 5 | Will | +-------------+---------------+ Exam table: +------------+--------------+-----------+ | exam_id | student_id | score | +------------+--------------+-----------+ | 10 | 1 | 70 | | 10 | 2 | 80 | | 10 | 3 | 90 | | 20 | 1 | 80 | | 30 | 1 | 70 | | 30 | 3 | 80 | | 30 | 4 | 90 | | 40 | 1 | 60 | | 40 | 2 | 70 | | 40 | 4 | 80 | +------------+--------------+-----------+ Result table: +-------------+---------------+ | student_id | student_name | +-------------+---------------+ | 2 | Jade | +-------------+---------------+ For exam 1: Student 1 and 3 hold the lowest and high score respectively. For exam 2: Student 1 hold both highest and lowest score. For exam 3 and 4: Studnet 1 and 4 hold the lowest and high score respectively. Student 2 and 5 have never got the highest or lowest in any of the exam. Since student 5 is not taking any exam, he is excluded from the result. So, we only return the information of Student 2.
Solution
WITH cte AS (SELECT student_id, score, exam_id, (CASE WHEN score < MAX(score) OVER (PARTITION BY exam_id) AND score > MIN(score) OVER (PARTITION BY exam_id) THEN 'middle' ELSE 'highlow' END) AS category FROM Exam ORDER BY student_id), cte1 AS (SELECT student_id FROM cte GROUP BY student_id HAVING SUM(CASE WHEN category = 'highlow' THEN 1 ELSE 0 END) = 0 ) SELECT cte1.student_id, s.student_name FROM cte1 JOIN Student s ON cte1.student_id = s.student_id ORDER BY cte1.student_id
Alternate Approaches…
WITH cte AS (SELECT student_id, score, exam_id, max(score) OVER (PARTITION BY exam_id) AS maxscore, min(score) OVER (PARTITION BY exam_id) AS minscore FROM Exam), cte1 AS (SELECT student_id FROM cte WHERE score = maxscore OR score = minscore ) SELECT DISTINCT Exam.student_id, Student.student_name FROM Exam JOIN Student ON Exam.student_id = Student.student_id WHERE Exam.student_id NOT IN (SELECT student_id FROM cte1) ORDER BY Exam.student_id
WITH cte AS (SELECT student_id, rank() OVER (PARTITION BY exam_id ORDER BY score DESC) AS gethighest, rank() OVER (PARTITION BY exam_id ORDER BY score ASC) AS getlowest FROM Exam), cte1 AS (SELECT DISTINCT student_id, SUM(CASE WHEN gethighest = 1 OR getlowest = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY student_id ORDER BY student_id) AS numofhighlow FROM cte) SELECT cte1.student_id, student_name FROM cte1 JOIN Student ON cte1.student_id = Student.student_id WHERE cte1.numofhighlow = 0
-
Learn How To Write SQL Queries With Examples: #4
If somebody tortures the data enough (open or not), it will confess anything.
Paolo Magrassi, (Former vice president, research director, Gartner)Question Source: LeetCode
Solution Language: MySQL
This Q&A series will cover data questions from LeetCode and present my solutions to them. Please feel free to comment with your suggestions if you feel that these problems may be solved in a more optimized manner.
Question (LeetCode Question #262, Level: Hard)
Table:
Trips
+-------------+----------+ | Column Name | Type | +-------------+----------+ | Id | int | | Client_Id | int | | Driver_Id | int | | City_Id | int | | Status | enum | | Request_at | date | +-------------+----------+ Id is the primary key for this table. The table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are foreign keys to the Users_Id at the Users table. Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).
Table:
Users
+-------------+----------+ | Column Name | Type | +-------------+----------+ | Users_Id | int | | Banned | enum | | Role | enum | +-------------+----------+ Users_Id is the primary key for this table. The table holds all users. Each user has a unique Users_Id, and Role is an ENUM type of (‘client’, ‘driver’, ‘partner’). Banned is an ENUM type of (‘Yes’, ‘No’).
Write a SQL query to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between
"2013-10-01"
and"2013-10-03"
.The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.
Return the result table in any order. Round
Cancellation Rate
to two decimal points.The query result format is in the following example:
Trips table: +----+-----------+-----------+---------+---------------------+------------+ | Id | Client_Id | Driver_Id | City_Id | Status | Request_at | +----+-----------+-----------+---------+---------------------+------------+ | 1 | 1 | 10 | 1 | completed | 2013-10-01 | | 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 | | 3 | 3 | 12 | 6 | completed | 2013-10-01 | | 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 | | 5 | 1 | 10 | 1 | completed | 2013-10-02 | | 6 | 2 | 11 | 6 | completed | 2013-10-02 | | 7 | 3 | 12 | 6 | completed | 2013-10-02 | | 8 | 2 | 12 | 12 | completed | 2013-10-03 | | 9 | 3 | 10 | 12 | completed | 2013-10-03 | | 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 | +----+-----------+-----------+---------+---------------------+------------+ Users table: +----------+--------+--------+ | Users_Id | Banned | Role | +----------+--------+--------+ | 1 | No | client | | 2 | Yes | client | | 3 | No | client | | 4 | No | client | | 10 | No | driver | | 11 | No | driver | | 12 | No | driver | | 13 | No | driver | +----------+--------+--------+ Result table: +------------+-------------------+ | Day | Cancellation Rate | +------------+-------------------+ | 2013-10-01 | 0.33 | | 2013-10-02 | 0.00 | | 2013-10-03 | 0.50 | +------------+-------------------+ On 2013-10-01: - There were 4 requests in total, 2 of which were canceled. - However, the request with Id=2 was made by a banned client (User_Id=2), so it is ignored in the calculation. - Hence there are 3 unbanned requests in total, 1 of which was canceled. - The Cancellation Rate is (1 / 3) = 0.33 On 2013-10-02: - There were 3 requests in total, 0 of which were canceled. - The request with Id=6 was made by a banned client, so it is ignored. - Hence there are 2 unbanned requests in total, 0 of which were canceled. - The Cancellation Rate is (0 / 2) = 0.00 On 2013-10-03: - There were 3 requests in total, 1 of which was canceled. - The request with Id=8 was made by a banned client, so it is ignored. - Hence there are 2 unbanned request in total, 1 of which were canceled. - The Cancellation Rate is (1 / 2) = 0.50
Solution
Approach With Joins:
WITH temp AS
(SELECT DISTINCT t.Request_at AS Day,
COUNT(CASE WHEN Status <> ‘completed’ THEN Id ELSE null END) OVER
(PARTITION BY t.Request_at) AS canceled,
COUNT(Id) OVER(PARTITION BY t.Request_at) AS total
FROM Trips t JOIN Users uc JOIN Users ud
ON t.Client_Id = uc.Users_Id
AND t.Driver_Id = ud.Users_Id
WHERE uc.Banned = ‘No’ AND ud.Banned = ‘No’
AND t.Request_at BETWEEN CAST(‘2013-10-01’ AS Date)
AND CAST(‘2013-10-03’ AS Date))
SELECT Day,
CAST(canceled/total AS DECIMAL(65,2)) AS ‘Cancellation Rate’
FROM tempAlternate Approach (Without Joins)
SELECT Request_at AS Day,
CAST(COUNT(IF(Status != ‘completed’, true, null)) / COUNT(Id) AS DECIMAL(65,2))
AS ‘Cancellation Rate’
FROM Trips
WHERE Request_at BETWEEN ‘2013-10-01’ AND ‘2013-10-03’
AND Client_id IN (SELECT Users_Id FROM Users WHERE Banned = ‘No’)
AND Driver_Id IN (SELECT Users_Id FROM Users WHERE Banned = ‘No’)
GROUP BY Request_at; -
Learn How To Write SQL Queries With Examples: #3
Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.
Geoffrey Moore, management consultant and author of Crossing the ChasmQuestion Source: LeetCode
Solution Language: MySQL
This Q&A series will cover data questions from LeetCode and present my solutions to them. Please feel free to comment with your suggestions if you feel that these problems may be solved in a more optimized manner.
Question (LeetCode Question #185, Level: Hard)
Table:
Employee
+--------------+---------+ | Column Name | Type | +--------------+---------+ | Id | int | | Name | varchar | | Salary | int | | DepartmentId | int | +--------------+---------+ Id is the primary key for this table. Each row contains the ID, name, salary, and department of one employee.
Table:
Department
+-------------+---------+ | Column Name | Type | +-------------+---------+ | Id | int | | Name | varchar | +-------------+---------+ Id is the primary key for this table. Each row contains the ID and the name of one department.
A company’s executives are interested in seeing who earns the most money in each of the company’s departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.
Write an SQL query to find the employees who are high earners in each of the departments.
Return the result table in any order.
The query result format is in the following example:
Employee table: +----+-------+--------+--------------+ | Id | Name | Salary | DepartmentId | +----+-------+--------+--------------+ | 1 | Joe | 85000 | 1 | | 2 | Henry | 80000 | 2 | | 3 | Sam | 60000 | 2 | | 4 | Max | 90000 | 1 | | 5 | Janet | 69000 | 1 | | 6 | Randy | 85000 | 1 | | 7 | Will | 70000 | 1 | +----+-------+--------+--------------+ Department table: +----+-------+ | Id | Name | +----+-------+ | 1 | IT | | 2 | Sales | +----+-------+ Result table: +------------+----------+--------+ | Department | Employee | Salary | +------------+----------+--------+ | IT | Max | 90000 | | IT | Joe | 85000 | | IT | Randy | 85000 | | IT | Will | 70000 | | Sales | Henry | 80000 | | Sales | Sam | 60000 | +------------+----------+--------+ In the IT department: - Max earns the highest unique salary - Both Randy and Joe earn the second-highest unique salary - Will earns the third-highest unique salary In the Sales department: - Henry earns the highest salary - Sam earns the second-highest salary - There is no third-highest salary as there are only two employees
Solution
WITH temp AS
(SELECT Name AS Employee,
Salary,
DepartmentId,
DENSE_RANK() OVER (PARTITION BY DepartmentId ORDER BY Salary DESC) AS rnk
FROM Employee)
SELECT d.Name AS Department,
temp.Employee,
temp.Salary
FROM temp JOIN Department d
ON temp.DepartmentId = d.Id
WHERE temp.rnk<=3Difference between Rank() and Dense_Rank() Window Functions
In the above solution, I have used the Dense_Rank() function instead of the Rank() function. This is because the question describes a high earner in a department as “an employee who has a salary in the top three unique salaries for that department”.
To better illustrate the difference, check out the rank allocated to each employee in both cases (Note: I have displayed rank for all employees instead of just the top 3):
Result table: +----+----------+--------+--------------+--------+--------------+ | Id | Employee | Salary | DepartmentId | Rank() | Dense_Rank() | +----+----------+--------+--------------+--------+--------------+ | 4 | Max | 90000 | 1 | 1 | 1 | | 1 | Joe | 85000 | 1 | 2 | 2 | | 6 | Randy | 85000 | 1 | 2 | 2 | | 7 | Will | 70000 | 1 | 4 | 3 | | 5 | Janet | 69000 | 1 | 5 | 4 | | 2 | Henry | 80000 | 2 | 1 | 1 | | 3 | Sam | 60000 | 2 | 2 | 2 | +----+----------+--------+--------------+--------+--------------+
As you can see, in case of Rank(), a rank is skipped after the same ranks:
Joe and Randy rank 2 in Department 1, but Will ranks 4 instead of 3 (a rank is skipped).But we needed Will to rank 3 instead of 4, so that we can filter by the condition rank<=3 to get employees whose salaries are in the top 3 distinct salaries in each department.
For achieving this goal, we use Dense_Rank() function. As you can see above, Will ranks 3 instead of 4 in the Dense_Rank() column
-
Learn How To Write SQL Queries With Examples: #2
Data Is A Precious Thing And Will Last Longer Than The Systems Themselves.
Sir Tim Berners-Lee (The inventor of the World Wide Web)Question Source: LeetCode
Solution Language: MySQL
This Q&A series will cover data questions from LeetCode and present my solutions to them. Please feel free to comment with your suggestions if you feel that these problems may be solved in a more optimized manner.
Question (LeetCode Question #177, Level: Medium)
Write a SQL query to get the nth highest salary from the
Employee
table.+----+--------+ | Id | Salary | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | +----+--------+
For example, given the above Employee table, the nth highest salary where n = 2 is
200
. If there is no nth highest salary, then the query should returnnull
.+------------------------+ | getNthHighestSalary(2) | +------------------------+ | 200 | +------------------------+
Solution
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
SET N = N-1;
RETURN (
Select
CASE when count(distinct Salary) <= N then null
else
(select distinct Salary
from Employee
order by Salary desc limit 1 offset N)
end
from Employee
);
ENDNOTE:
- MySQL LIMIT and OFFSET syntax can only take numeric constants
- LIMIT 1 OFFSET N can also be written as LIMIT N,1
Question (LeetCode Question #184, Level: Medium)
The
Employee
table holds all employees. Every employee has an Id, a salary, and there is also a column for the department Id.+----+-------+--------+--------------+ | Id | Name | Salary | DepartmentId | +----+-------+--------+--------------+ | 1 | Joe | 70000 | 1 | | 2 | Jim | 90000 | 1 | | 3 | Henry | 80000 | 2 | | 4 | Sam | 60000 | 2 | | 5 | Max | 90000 | 1 | +----+-------+--------+--------------+
The
Department
table holds all departments of the company.+----+----------+ | Id | Name | +----+----------+ | 1 | IT | | 2 | Sales | +----+----------+
Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, your SQL query should return the following rows (order of rows does not matter).
+------------+----------+--------+ | Department | Employee | Salary | +------------+----------+--------+ | IT | Max | 90000 | | IT | Jim | 90000 | | Sales | Henry | 80000 | +------------+----------+--------+
Explanation:
Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.
Solution
Approach 1:
WITH temp AS
(SELECT d.Id, d.Name, MAX(e.Salary) AS Salary
FROM Employee e JOIN Department d
ON e.DepartmentId = d.Id
GROUP BY 1,2
)
SELECT temp.Name AS Department, Employee.Name AS Employee, Employee.Salary
FROM Employee JOIN temp
ON Employee.DepartmentId = temp.Id
WHERE Employee.Salary = temp.Salary;Approach 2:
WITH temp AS (
SELECT DepartmentId, Name, Salary,
RANK() OVER (PARTITION BY DepartmentId ORDER BY Salary DESC) AS rnk
FROM Employee)
SELECT d.Name AS Department, temp.NAME AS Employee, temp.SALARY AS Salary
FROM temp
JOIN Department d
ON d.Id = temp.DepartmentId AND rnk = 1Approach 3
SELECT Department, Employee, Salary
FROM (SELECT
d.Name AS Department,
e.Name AS Employee,
Salary,
RANK() OVER (PARTITION BY e.DepartmentId ORDER BY e.Salary DESC) AS rnk
FROM Employee e JOIN Department d
ON e.DepartmentId = d.Id) AS temp
WHERE rnk = 1NOTE:
- Approach 2 and 3 use “Window Functions” to solve this problem.
- Window functions are used to optimize queries for efficiency and reduce query complexity when querying large datasets.
- To learn more about the Window functions used in MySQL 8.0, click here.
- Another source for learning about Window functions is Mode Analytics.
-
Learn How To Write SQL Queries With Examples: #1
The goal is to turn data into information, and information into insight.
Carly Fiorina (Former CEO of Hewlett-Packard)Question Source: LeetCode
Solution Language: MySQL
This Q&A series will cover data questions from LeetCode and present my solutions to them. Please feel free to comment with your suggestions if you feel that these problems may be solved in a more optimized manner.
Question (LeetCode Question #176, Level: Easy)
Write a SQL query to get the second highest salary from the
Employee
table.+----+--------+ | Id | Salary | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | +----+--------+
For example, given the above Employee table, the query should return ‘
200'
as the second highest salary. If there is no second-highest salary, then the query should returnnull
.+---------------------+ | SecondHighestSalary | +---------------------+ | 200 | +---------------------+
Solution
Select Max(Salary) as SecondHighestSalary
From Employee
Where Salary not in (Select Max(Salary) From Employee)Question (LeetCode Question #181, Level: Easy)
Write a SQL query that finds out employees who earn more than their managers.
The
Employee
table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.+----+-------+--------+-----------+ | Id | Name | Salary | ManagerId | +----+-------+--------+-----------+ | 1 | Joe | 70000 | 3 | | 2 | Henry | 80000 | 4 | | 3 | Sam | 60000 | NULL | | 4 | Max | 90000 | NULL | +----+-------+--------+-----------+
Given the
Employee
table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.+----------+ | Employee | +----------+ | Joe | +----------+
Solution
SELECT e.Name AS Employee
FROM employee e JOIN employee m
ON e.ManagerID = m.Id AND e.Salary > m.SalaryAlternative Approach (Faster Query):
select e.Name as Employee
from (select e.* , m.Salary as MgrSalary
from employee e join employee m
on e.ManagerID = m.Id) AS temp
WHERE temp.Salary > temp.Salary