PostgreSQL inheritance

On stackoverflow and elsewhere I stumbled over questions about inheritance in PostgreSQL a couple of times.1 People are confused about how this feature is related to inheritance in object oriented programming (OOP) and how to use it.

This article provides a short example, showing that inheritance on PostgreSQL can not only be used for partitioning of large tables, but also to achieve a polymorphy-like behavior in database functions. In the application I currently work on, I experienced this as a nice way to avoid duplication of code. In the second part of this post I will show what happens under the surface, when table structures that use inheritance are queried. We will take a closer look at how to index those tables and how query performance can be optimized by using pg_class, a PostgreSQL system table and the EXECUTE FORMAT command to dynamically decide, what tables are queried. The whole sample code used in my example can be downloaded from github and I used the database version 9.6 to run it, but the inheritance feature is available since version 9.2. Since I use UUIDs in this example to assure I have unique identifiers inside the schema, you have to install the uuid-ossp extension before the code will work:

    CREATE EXTENSION "uuid-ossp";

This extension provides us with a couple of functions to create UUIDs.2 You will see later in this article, that the unique identifiers are required, because we have to identify relations uniquely over different tables.

But lets start and get our hands dirty. Let’s assume you want to implement a small blog system, where you have two kinds of information, that should be published: blog articles and user profiles. Our table schema can be something like this:

inheritance_blog_v1
Database tables of our blog system

In our example we have a small blog system, where users can write blog articles, that are saved in the blog_article table. Additionally they can have user profiles to keep some information about themselves. They are stored in the user_profile table. Since we want our users to be able to save new information, but not to publish it at the same time, we have a table publishable with a boolean column is_public. The tables blog_article and user_profile inherit this table.

To achieve this on our database, we first have to create the table publishable:

CREATE TABLE publishable
(
id uuid NOT NULL,
time_modified timestamp without time zone,
is_public boolean,
CONSTRAINT publishable_pkey PRIMARY KEY (id)
);

Then we can create the two other tabels inheriting from publishable:

CREATE TABLE blog_article
(
 title TEXT,
 content TEXT,
 CONSTRAINT blog_article_pkey PRIMARY KEY (id)
) INHERITS (publishable);
CREATE TABLE user_profile
(
 firstname text,
 lastname text,
 about_me text,
 CONSTRAINT user_profile_pkey PRIMARY KEY (id)
) INHERITS (publishable);

As you can see in the highlighted lines of the two CREATE statements, all we have to do to let the tables inherit from publishable is adding “INHERITS (publishable) at the end.

If you now run

SELECT column_name, data_type, character_maximum_length
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'blog_article';

and

SELECT conname
FROM pg_constraint
WHERE conrelid =
(SELECT oid
FROM pg_class
WHERE relname LIKE 'blog_article');

you can see that your inherited tables have the columns we defined in the CREATE statement as well as the columns from publishable. Even the primary key constraint on the id-column inherited from publishable is in place.

Ok, lets get back to our blog application and our previously defined feature to be able to publish articles and user profiles. If we had written an application with pre-inheritance SQL, we had to implement the same function for each and every table using an is_publishable flag. This is not only an annoying effort but also error prone, since every time we have to do a change or bug-fix, we have to apply it to each of those functions. In object oriented inheritance this problem can be solved by using polymorphism, which is basically defined as the provision of a single interface to entities of different types. 3 Technically, we would achieve this by defining a class with a method publish() and inherit this method to our subclasses representing a blog article, user profile, etc.

oop_inheritance
Object oriented class model of publish-method inheritance

Since a relation in an relational database does not contain any behavior, it is obvious that polymorphism can’t be applied to a relational model the same way as to an object oriented. But we can achieve a kind of polymorphism when we define a SQL-function like this:

--
--	Change the publicity of publishable entities on the inherited table.
--
--	@params _id		The id of the entity
--	@params _is_public	TRUE if new status is public, FALSE if not
--
CREATE FUNCTION change_publicity_on_publishable(_id UUID, _is_public BOOLEAN)
RETURNS VOID AS
$$
BEGIN
    UPDATE publishable SET is_public = _is_public WHERE id = _id;
END;
$$
LANGUAGE plpgsql;

This function accepts a row identifier and a boolean type as parameters. The boolean type specifies if the entity identified by the row identifier has to be set to be published or not. As you might have noticed, I run the update query to achieve this on the publishable table, that the tables blog_article and user_profile inherit from. If you call this method (e.g. in pgAdmin) with an existing id you will see, that the publishable flag is set, regardless to if the identified row is from blog_article, user_profile or any other table inheriting publishable. This is great in a way, that it fulfills our definition of polymorphism and shows, that we definitely can do more than just partitioning a big table with PostgreSQL’s inheritance feature.

OK, you might think, but in OOP a blog article “is a” publishable. So what happens if I change our change_publicity_on_publishable-function to accept a publishable as parameter instead of the UUID?

--
--	Change the publicity of publishable entities on the inherited table.
--
--	@params _publishable	The publishable entity
--	@params _is_public	TRUE if new status is public, FALSE if not
--
CREATE FUNCTION change_publicity_on_publishable(_publishable publishable, _is_public BOOLEAN)
RETURNS VOID AS
$$
BEGIN
	UPDATE publishable SET is_public = _is_public WHERE id = _publishable.id;
END;
$$
LANGUAGE plpgsql;

If you run this with

SELECT * FROM change_publicity_publishable(('ffff3122-2280-4eb5-bac5-d57f7608067d','2016-11-26 16:27:56.420925',FALSE,
'Lorem ipsum dolor sit 186207','Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat,sed diam voluptua. 186207')::blog_article,
true);

giving a blog article as a parameter, you can see that there is no error thrown and that the update still works. (If you use my import scripts to generate sample data, please don’t forget to use one of the generated ids in this query, since you will not have the same ids, as in my example.)

As promised at the beginning of my article I will show you how this works inside the PostgreSQL database. When we run EXPLAIN ANALYSE on the UPDATE query inside the above function, we get the following query plan:

ea_change_publicity_on_publishable

qp_update_publishable

So let’s have a look at it in detail. There are two interesting things, that catch the eye. First, the query plan says, there is a sequence scan on publishable. This is a bit weird because the WHERE-condition of our UPDATE-query is set on the id of the publishable table and since we have defined a primary key on that, publishable should be indexed. Second, there are index scans on the tables blog_article and user_profile. If one has a look at the costs and the time required by the scans, it becomes pretty clear, that selecting the row to be changed is achieved by joining each of the tables inheriting from version. If the sequence scan on publishable was actually done, it had taken a significantly larger amount of time than each of the two other scans, since I have loaded a million lines into each of the sub-tables. But the sequence scan is estimated to have both costs and time close to zero. By looking at the subsequent entries, we can see there are index scans on blog_article and user_profile, which are estimated with plausible costs and time. So obviously, the select on publishable is achieved by joining each of the sub-tables and the data is selected, using the indexes that exist because of our primary key constraints on the sub-tables. This makes sense, since we do not have defined other indexes so far and with PostgreSQL a primary key constraint always generates an index on the primary key column.4

This system of selecting from a super-table by automatically joining all the child-tables seems to be an easy way to implement database inheritance, but it also comes with a downside. What if we make extensive use of this feature in a way, that we have selects on super tables, that join other super-tables. In a scenario like this we might end up with a huge amount of joins per select statement. For this case we can use another nice feature of PostgreSQL: Execution of dynamic commands.5 This technique enables us to dynamically tell the database, what tables to query/join while running a query.

To show how this works, we enhance our application with user permissions. Our schema gets a new super-table permission and sub-tables for the permissions of the single entites like blog_article and user_profile. A new table user holds users that could be used to log into the system.

inheritance_blog_v2

With this new constellation we can check, if a user has permissions on an entity with a function like this:

--
--	Get users permission on an publishable entity.
--
--	@params _publishable_id		The id of the entity
--	@params _username		The username to get the permissions for
CREATE FUNCTION permissions_by_username_pg_class(_publishable_id UUID, _username CHARACTER VARYING)
    RETURNS SETOF permission AS
$$
    DECLARE _relname CHARACTER VARYING;
BEGIN
    -- Join the postgresql system table pg_class to find out the type of relation for the given identifier _id.
    SELECT p.relname INTO _relname
    FROM publishable v
    INNER JOIN pg_class p ON v.tableoid = p.oid
    WHERE v.id = _publishable_id;

    RETURN QUERY
    -- Use execute format to decide at runtime on which table to update the publicity flag.
    EXECUTE FORMAT('SELECT pe.* FROM %I pe
        INNER JOIN %I pu ON pe.publishable_id = pu.id
        INNER JOIN "user" u ON pe.user_id = u.id
        WHERE u.username = $1 AND pu.id = $2',
        'permission_' || _relname, _relname) USING _username, _publishable_id;
END;

The above query has two interesting parts. In the first select statement the publishable_id, received as a  function parameter, is used to get the concrete sub-table’s name of the publishable. This table name is used afterwards to dynamically decide at runtime, which publishable to query and what permission-table to join. All you have to do to successfully work with this, is to define a naming convention for your permission sub-tables and stick to it. And as you can see, you only have one extra query to get the first sub-tables name. The second query runs with no extra join. But be aware, that none of this two ways is per se better. You have to check the query plans for your actual situation, to find out what makes the most sense for you.

Footnotes

1 The PostgreSQL Global Development Group, “Basic Statements”, accessed on 08 January 2017
2 The PostgreSQL Global Development Group, “Unique Indexes”, accessed on 17 December 2016
3 Bjarne Stroustrup (October 3, 2012). “Bjarne Stroustrup’s C++ Glossary”, accessed on 08 January 2017
4 The PostgreSQL Global Development Group, “Unique Indexes”, accessed on 17 December 2016
5 The PostgreSQL Global Development Group, “Basic Statements”, accessed on 08 January 2017