Tag finder:

Few words about Postgresql's window functions

 Posted by greg on June 30, 2011 6:44 PM |  Comments are deactivated for this post
Tags : geek  computing  postgresql 

Ranking

Understanding the principle of window functions is easy as soon as you realize it has to do with rankings. Let’s take a practical example:

CREATE TABLE 
    production (
            company_name VARCHAR PRIMARY KEY,
            country VARCHAR NOT NULL,
            production INTEGER NOT NULL
        );

This table will store some company’s production statistics. Let’s fill it with data

INSERT INTO production (company_name ,country, production) VALUES 
('futur corp', 'états-unis', 12),
('eep techno', 'états-unis', 14),
('keystop', 'états-unis', 8),
('vehiculo longo', 'brésil', 16),
('Kastarok', 'russie', 13),
('Moldonov', 'russie', 7),
('Glaß gmbh', 'allemagne', 8),
('Gien zu', 'chine', 11),
('chang intl', 'chine', 15),
('medong', 'chine', 12)
;

Of course it is easy to display the company list sorted by production but why not adding the ranking information ?

SELECT
    company_name,
    country,
    production,
    rank() OVER (ORDER BY production DESC)
FROM
    production
    ;

  company_name  |  country   | production | rank 
----------------+------------+------------+------
 vehiculo longo | brésil     |         16 |    1
 chang intl     | chine      |         15 |    2
 eep techno     | états-unis |         14 |    3
 Kastarok       | russie     |         13 |    4
 futur corp     | états-unis |         12 |    5
 medong         | chine      |         12 |    5
 Gien zu        | chine      |         11 |    7
 Glaß gmbh      | allemagne  |          8 |    8
 keystop        | états-unis |          8 |    8
 Moldonov       | russie     |          7 |   10

You can see here the obvious advantages of using window functions, two companies have by example the same production, they have rown 8 and there is no 9th rank ! (This behavior can be changed by using the _dense_rank()_ function.). The rank() window function has here been invoqued over a unique partition of all results order by production DESC. This means we can add a rank information per country by example:

SELECT
    company_name,
    country,
    production,
    rank() OVER (PARTITION BY country ORDER BY production DESC) as country_rank
FROM
    production
    ;

  company_name  |  country   | production | country_rank 
----------------+------------+------------+--------------
 Glaß gmbh      | allemagne  |          8 |            1
 vehiculo longo | brésil     |         16 |            1
 chang intl     | chine      |         15 |            1
 medong         | chine      |         12 |            2
 Gien zu        | chine      |         11 |            3
 eep techno     | états-unis |         14 |            1
 futur corp     | états-unis |         12 |            2
 keystop        | états-unis |          8 |            3
 Kastarok       | russie     |         13 |            1
 Moldonov       | russie     |          7 |            2

Here you see the two first results are marked with a rank of 1 because these countries own one factory. Note Postgresql has naturally sorted the results by country then ranking ASC. Of course you can filter on a rank but you have to use a sub select:

SELECT
    company_name,
    country,
    production,
    country_rank
FROM (
    SELECT
        company_name,
        country,
        production,
        rank() OVER (PARTITION BY country ORDER BY production DESC) as country_rank
    FROM
        production
    ) sub_select
WHERE 
    sub_select.country_rank = 1
    ;

Window functions are not just about ranking. What if we want to indicate the maximum production for each country next to the _country_rank_ ?

SELECT
   company_name,
    country,
    production,
    first_value(production) OVER (PARTITION BY country ORDER BY production DESC) as max_country_production,
    rank() OVER (PARTITION BY country ORDER BY production DESC) as country_rank
FROM
    production
ORDER BY country, country_rank ASC
;

  company_name  |  country   | production | max_country_production | country_rank 
----------------+------------+------------+------------------------+--------------
 Glaß gmbh      | allemagne  |          8 |                      8 |            1
 vehiculo longo | brésil     |         16 |                     16 |            1
 chang intl     | chine      |         15 |                     15 |            1
 medong         | chine      |         12 |                     15 |            2
 Gien zu        | chine      |         11 |                     15 |            3
 eep techno     | états-unis |         14 |                     14 |            1
 futur corp     | états-unis |         12 |                     14 |            2
 keystop        | états-unis |          8 |                     14 |            3
 Kastarok       | russie     |         13 |                     13 |            1
 Moldonov       | russie     |          7 |                     13 |            2

Or maybe you want to display the total production for each country:

SELECT
    company_name,
    country,
    production,
    first_value(production) OVER (PARTITION BY country ORDER BY production DESC) as max_country_production,
    sum(production) OVER (PARTITION BY country) AS country_production,
    rank() OVER (PARTITION BY country ORDER BY production DESC) as country_rank
FROM
    production
ORDER BY country_production DESC, country_rank ASC 
;
  company_name  |  country   | production | max_country_production | country_production | country_rank 
----------------+------------+------------+------------------------+--------------------+--------------
 chang intl     | chine      |         15 |                     15 |                 38 |            1
 medong         | chine      |         12 |                     15 |                 38 |            2
 Gien zu        | chine      |         11 |                     15 |                 38 |            3
 eep techno     | états-unis |         14 |                     14 |                 34 |            1
 futur corp     | états-unis |         12 |                     14 |                 34 |            2
 keystop        | états-unis |          8 |                     14 |                 34 |            3
 Kastarok       | russie     |         13 |                     13 |                 20 |            1
 Moldonov       | russie     |          7 |                     13 |                 20 |            2
 vehiculo longo | brésil     |         16 |                     16 |                 16 |            1
 Glaß gmbh      | allemagne  |          8 |                      8 |                  8 |            1

You may notice we have several partitions with the same definition above. We can regroup these definitions into one window statement:

SELECT
    company_name,
    country,
    production,
    first_value(production) OVER country_prod as max_country_production,
    sum(production) OVER (PARTITION BY country) AS country_production,
    rank() OVER country_prod as country_rank
FROM
    production
WINDOW 
    country_prod AS (PARTITION BY country ORDER BY production DESC)
ORDER BY 
    country_production DESC, 
    country_rank ASC
;

Playing with partitions

As the window function knows its row it is possible to create cumulative statistics:

SELECT
company_name,
    country,
    production,
    first_value(production) OVER country_prod as max_country_production,
    rank() OVER country_prod as country_rank,
    sum(production) OVER (PARTITION BY country ORDER BY production DESC RANGE UNBOUNDED PRECEDING) as cumul_prod
FROM
    production
WHERE
    country = 'chine'
WINDOW 
    country_prod AS (PARTITION BY country ORDER BY production DESC)
ORDER BY 
    country DESC, 
    country_rank ASC
    ;

Or cumulative production percentage:

SELECT
    company_name,
    country,
    production,
    rank() OVER (PARTITION BY country ORDER BY production DESC) AS country_rank,
    sum(production) OVER (PARTITION BY country ORDER BY production DESC RANGE UNBOUNDED PRECEDING) / sum(production) OVER (PARTITION BY country) as cumul_prod
FROM
    production
WHERE
    country = 'chine'
ORDER BY 
    country DESC, 
    country_rank ASC

This is it, hope you have found this article useful. Take care !

Recent Links 

 sulzer young regulator

Bookmarked by greg on February 21, 2013 1:55 PM..
Tags: geek electronic

Follow link

 Méthode musique

Bookmarked by greg on February 2, 2013 3:00 PM..
Tags: geek

Follow link

 Pg modeler

Bookmarked by greg on September 29, 2012 6:28 AM..
Tags: geek computing postgresql

Follow link

 Parsing binary strings with PHP

Bookmarked by greg on September 24, 2012 3:29 PM..
Tags: geek computing dev php

Follow link

 Sprite map packager

Bookmarked by greg on September 24, 2012 2:24 PM..
Tags: geek computing dev

Follow link

 Rasp. Pi supercompteur

Bookmarked by greg on September 13, 2012 11:54 AM..
Tags: geek computing

Follow link

 Crew - code review for Git projects.

Bookmarked by greg on May 6, 2012 7:57 AM..
Tags: geek computing git

Follow link

 responsive grid system

Bookmarked by greg on May 5, 2012 12:38 PM..
Tags: geek computing dev

Follow link

 Vim "improved

Bookmarked by greg on March 8, 2012 9:26 AM..
Tags: geek computing vim

Follow link

 Highly scalable NoSql blog

Bookmarked by greg on March 3, 2012 9:12 AM..
Tags: geek computing dev

Follow link