Ranking in SQL

The Ordering

Lets get a table with some data. Imagine a table of revenue💵 generated by each sales representative👔. We want to find the top performers🌟

Id Team Name Revenue
101 Angel's Wings Abraham 6005
102 Boosters John 5611
103 Devil's Cry Helena 8461
104 Redemption Kate 8180
105 Angel's Wings Jacob 9327
106 Devil's Cry Mary 8461
107 Boosters Menalaus 8461
108 Redemption Adam 8461
109 Redemption Noah 7483
110 Devil's Cry Jenny 6000
SELECT
    Id,
    Name,
    Revenue
FROM
    salesData_tbl
ORDER BY
    Revenue DESC;

The default ordering in SQL is ascending (keyword: ASC). To get descending we used the DESC

Professionally, its better to be explicit and mention ASC (even if its default) to avoid confusion while understanding the code

To, us the result looks something like this:

Id Team Name Revenue
105 Angel's Wings Jacob 9327
103 Devil's Cry Helena 8461
106 Devil's Cry Mary 8461
107 Boosters Menalaus 8461
108 Redemption Adam 8461
104 Redemption Kate 8180
109 Redemption Noah 7483
101 Angel's Wings Abraham 6005
110 Devil's Cry Jenny 6000
102 Boosters John 5611

The ranking

What we did above is sufficient for cases like getting the top performer. In many cases people end up doing ranking. While it has its use cases, it is not necessary right now

Then, when is it necessary🤔?

We have 3 different window functions in SQL for ranking. They are:

  1. Row number
  2. Rank
  3. Dense Rank

Most practically, we use Row number in Data engineering to remove duplicates, so you might have heard and used it before😃. But today, we shall explore where to use all of them

But before that, lets actually look at what they do to avoid confusion and a side-by-side comparison. So, lets run the query:

SELECT
    Id,
    Name,
    Revenue,
    ROW_NUMBER() OVER (ORDER BY Revenue Desc) as RowNum,
    RANK() OVER (ORDER BY Revenue Desc) as Rnk,
    DENSE_RANK() OVER (ORDER BY Revenue Desc) as DRnk
FROM
    salesData_tbl;

Notice how we didn't specify the ORDER BY Revenue in the end because its part of window function syntax

Id Team Name Revenue RowNum Rnk Drnk
105 Angel's Wings Jacob 9327 1 1 1
103 Devil's Cry Helena 8461 2 2 2
106 Devil's Cry Mary 8461 3 2 2
107 Boosters Menalaus 8461 4 2 2
108 Redemption Adam 8461 5 2 2
104 Redemption Kate 8180 6 6 3
109 Redemption Noah 7483 7 7 4
101 Angel's Wings Abraham 6005 8 8 5
110 Devil's Cry Jenny 6000 9 9 6
102 Boosters John 5611 10 10 7

The real deal is when there is a tie. Obviously, there is a tie between Helena, Mary, Menalaus and Adam

Row Number

Row number ensures that it does not contain a duplicate value. So, when there is a tie among them, it puts the tied people in random order

If we want a specific order for tied people we can do that as ROW_NUMBER() OVER (ORDER BY Revenue DESC, Id ASC)

Imagine a scenario where I held this as an examination/interview and these are their scores. I want to select 4 people for my company🏭. Row number does the job perfect for me

Rank

Rank is used to find how many people are better than you. Suppose, Kate wants to login and see how many people out-performed her. Since, her rank is 6 it means 5 people out-performed her. Similarly for Menalaus at rank 2, 1 person out performed him

Dense Rank

Dense rank does not leave gaps in between because someone tied above you. It is not as ruthless as Rank

Suppose I am giving bonus to TOP 3 performers. Using row number means injustice to Menalaus, Adam and Kate. Using rank means injustice to Kate. Enter Dense rank and it fixes everything

Catch the duplicates

So, now you know how and where to use ranking in SQL

More commonly we see row number in our code. It's most commonly used to remove duplicates in Data Engineering. Imagine you are tracking the log-ins and log-outs of employees. You want to see if there are any duplicates in the table (it shouldn't. Otherwise you are wasting your storage for nothing). To keep it simple, lets say there is only 1 employee

Id Action Time
101 In 10:01
101 Out 12:07
101 In 12:49
101 Out 17:57
101 In 18:09
101 Out 18:32
101 Out 18:32
101 Out 18:32

Now, its acceptable for a person to log out and log back in for some left over task. But the last 3 entries are true duplicates. To automatically find such entries we can run the query

SELECT
    *,
    ROW_NUMBER() OVER (ORDER BY Time Desc) as RowNum
FROM
    logs_tbl
WHERE
    Action = 'Out';

Its a bit unfair we knew before hand that the duplicates are in Action = 'Out'. But bear with me ...

Id Action Time RowNum
101 Out 18:32 1
101 Out 18:32 2
101 Out 18:32 3
101 Out 17:57 4
101 Out 12:07 5

And for the magic🪄, we add PARTITION BY. Partition... well makes partitions based on a criteria. Here we want to create a partition/compartment of particular times

SELECT
    *,
    ROW_NUMBER() OVER (
        PARTITION BY Time
        ORDER BY Time Desc
    ) as RowNum
FROM
    logs_tbl
WHERE
    Action = 'Out';
Id Action Time RowNum
101 Out 18:32 1
101 Out 18:32 2
101 Out 18:32 3
101 Out 17:57 1
101 Out 12:07 1

Hence, if we find any row number > 1, it means that the table has duplicates. As for the reason why I chose Action = 'Out' (even though in this case it makes no difference), is to ensure I don't catch cases where the person logs in and logs out within the minute. In that case, the system would have noted down the same time for both log in and log out, and this would have shown up as duplicate

The Partitioning

Since, you already learnt partitioning above, it does not require another introduction. Remember the very first table, with all the crazy team names. Imagine, you want the best performer from each team to be made the captain🤠 or something... In this case we can use partition on the team name

SELECT
    Id,
    Name,
    Revenue,
    ROW_NUMBER() OVER (
        PARTITION BY Team
        ORDER BY Revenue Desc
    ) as RowNum
FROM
    salesData_tbl;
Id Team Name Revenue RowNum
105 Angel's Wings Jacob 9327 1
101 Angel's Wings Abraham 6005 2
103 Devil's Cry Helena 8461 1
106 Devil's Cry Mary 8461 2
110 Devil's Cry Jenny 6000 3
107 Boosters Menalaus 8461 1
102 Boosters John 5611 2
108 Redemption Adam 8461 1
104 Redemption Kate 8180 2
109 Redemption Noah 7483 3

So, in Devil's Cry, even though Helena and Mary were tied, Helena got the captaincy🤠