I hope that you played the game and try 1 or more SQL queries to answer the request from the Database Administrator provided activity.
Once again we try on an approach easy to understand with a real world use case. Do not hesitate to post a comment to discuss about it.
Using your favorite SQL server (ex : PostgreSQL), you should have the following rows :
Well this one should be handled step by step except if you have a good understanding of SQL !
Take the request and divide it so we can deal with it later :
- Pick up all students having at least 3 marks above 8 :
SELECT student.name as student, student.id, score FROM score INNER JOIN student ON score.student_id = student.id WHERE score > 8 GROUP BY student.id, student.name, score HAVING count(score) > 2 ORDER BY student.name;
Here you are :
There is more data using scores above 8. You may check the results on your own.
Now the point is that if we need to show the relative disciplines names, we will end up with 3 or more lines for each student (actually only Megane is concerned :)).
Megane has got 3 disciplines on which she obtained a score of 10.
How do we list them along with the rest ?
Well that is not so obvious using the last query; think about it :
Do you think you can have 1 discipline having more than 2 scores above 9 for the same student ?
Basically the HAVING clause is filtering on group elements and if the last doesn’t match, then no result is returned.
More details about the Group By and Having clauses are available on DataCamp.
Following is what I propose as a solution, you may use the former query as a SUB QUERY (you could also use a WITH Query however it is not necessary for such simple use case) :
SELECT discipline.name, student_name, scoring FROM discipline INNER JOIN score ON score.discipline_id = discipline.id INNER JOIN (SELECT st.name as student_name, s.score as scoring , st.id as student_id FROM score s INNER JOIN student st ON s.student_id = st.id WHERE s.score > 9 GROUP BY st.id, st.name, scoring HAVING count(s.score) > 2 ORDER BY st.name) as sub_score ON sub_score.scoring = score.score AND sub_score.student_id = score.student_id;
Now using the WITH AS query (just for fun !) :
WITH sub_score AS (SELECT st.name as student_name, s.score as scoring , st.id as student_id FROM score s INNER JOIN student st ON s.student_id = st.id WHERE s.score > 9 GROUP BY st.id, st.name, scoring HAVING count(s.score) > 2 ORDER BY st.name) SELECT discipline.name, student_name, scoring FROM discipline INNER JOIN score ON score.discipline_id = discipline.id INNER JOIN sub_score ON sub_score.scoring = score.score AND sub_score.student_id = score.student_id;
Both queries have the same query plan so pick up your favorite syntax !
As usual, let’s see if we can improve the overall performance.
There are 2 spots which seems to be interesting :
- the score is used for aggregation
- The student name is used for ordering
CREATE INDEX idx_score ON score(score);
Using the above index, we notice that Postgres is using another path to process the request, there is a slight improvement.
CREATE INDEX idx_name ON student(name);
Now we are talking ! The cost of the query has been reduced significantly, you may now play with billions of students ;).
If you take a look at the query plan of each configuration, you will notice the change about the cost of the Sort process, it went from ~70 to approximately ~3.55.
That’s it for today, I’ll be welcoming any constructive comment to help us improve our lines.