Tree structure in relational db’s: theory and use in Symfony

Encountering the problem

I’m not exactly Joomla’s biggest fan, but somehow it’s worked out that the latest project I’ve been working on is a Joomla3 project. For one part of the site we used a third party component. The client provided us with data (a list of categories and companies) which we needed to import to this component. We experienced a problem though. The data had complicated structures and many nested levels. When we imported this structure into the third party component, we got very slow queries to our MySql database. This was because the third party component had a very large, unoptimized sql query and together with the data, it became very complicated very fast.

We still went ahead and used this component and eventually resolved the problem with caching our queries, but after it was done I decided to find out how to create the same structure on Symfony 2 Framework (or even on Symfony 3!) so that it would be faster and less complicated.

Looking at Theory

It’s not a new problem, the “parent” problem in tree-based data sets has been around for a long time. Essentially the task is to quickly retrieve a dataset containing parents and children, without resorting to a recursive or massive nested join query.

Of course, relational databases are not really intended for management of hierarchical data. The parent-child relationship inherent in hierarchical data is not naturally represented in a relational database table. As you can see in the picture below, in hierarchical data each item has a single parent and zero or more children (with the exception of the root item, which has no parent):

Screen Shot 2015 12 10 at 18.09.59

There are a number of different approaches to the problem, each involving a different model of hierarchical data implementations. Below I’ll try and summarise a few of them, but if you’re interested in a more detailed explanation of the different models, I found this post very helpful for understanding the adjacency list and nested set models and the following articles for the closure tables and materialized path models. Those are the four main models which I’ll be looking at here.

The Adjacency List Model

This model has each table item contain a pointer to its parent (with the root item having a NULL value for its parent). This is a tricky model to implement with pure SQL as you need to know the level of a category before you can see its full path. Also, there’s a risk of orphaning sub-trees when deleting nodes, so extra care needs to be taken!

So, SQL queries to populate parent fields for each row and select data from tables will look like this:

CREATE TABLE category (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);

INSERT INTO category VALUES
(1,‘Databases’,NULL),
(2,‘Relational’,1),
(3,‘Document’,1),
(4,‘MySQL’,2),
(5,‘PostgreSQL’,2),
(6,‘MongoDB’,3);

Retrieving children of parent node
SELECT `id`,`name` FROM `category`
WHERE `category`.`parent` = 2;

Result:
***MySQL
***PostgreSQL

The Materialized Paths model

In the Materialized Paths model, a document stores each tree node along with a string of the nodes path. This pattern is more flexible in allowing work with the path, but does involve working with strings and regular expressions.

So SQL queries to populate field paths for each row and select data from tables will look like this:

CREATE TABLE category (
name VARCHAR(20) NOT NULL,
path VARCHAR(256) DEFAULT NULL
);
INSERT INTO category VALUES
(‘Databases’, NULL),
(‘Relational’, ‘,Databases,’),
(‘Document’, ‘,Databases,’),
(‘MySQL’, ‘,Databases,Relational,’),
(‘PostgreSQL’, ‘,Databases,Relational,’),
(‘MongoDB’, ‘,Databases,Document,’);

Retrieving a Sub-tree Path:
SELECT name FROM category
WHERE path LIKE “,Databases,Relational,%”;

Result:
***MySQL
***PostgreSQL

The Closure Table Model
In the Closure Table model, all paths from one node in the tree to another are stored. Of course, this requires a lot of space, as you are storing a lot more pathing data, but it’s more flexible than something like the Nested Set model (see below) in cases where ordering doesn’t matter. You need to decide what your priority is when choosing a model. So in a closure table we store for each node its “ancestor” (parent), “descendant” (child), and “depth”, which is the level from the tree’s root.

So SQL queries to populate a closure table and select data from tables will look like this:
CREATE TABLE category (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);

INSERT INTO category VALUES
(1,‘Databases’,NULL),
(2,‘Relational’,1),
(3,‘Document’,1),
(4,‘MySQL’,2),
(5,‘PostgreSQL’,2),
(6,‘MongoDB’,3);

CREATE TABLE category_closure (
ancestor INT DEFAULT NULL,
descendant INT DEFAULT NULL,
depth INT DEFAULT NULL
);

INSERT INTO comment_closure VALUES
(1,1,0),(0,1,0),(2,2,0),(1,2,1),(3,3,0),(1,3,1),(0,3,1),(4,4,0),(2,4,1),(1,4,2),(0,4,2),(5,5,0),(2,5,1),
(1,5,2),(0,5,2),(6,6,0),(3,6,1),(1,6,2),(0,6,2);

Retrieving a Sub-tree Path:
SELECT `id`,`parent` FROM `category`
JOIN `category_closure` `closure`
ON `category`.`id` = `closure`.`descendant`
WHERE `closure`.`ancestor` = 2;

Result:
**Relational
***MySQL
***PostgreSQL

The Nested Set Model
The Nested Set Model uses an approach called the modified preorder tree traversal algorithm. What this means is that we work from the left to the right of a tree, one layer at a time, descending to each node’s children before assigning a right-hand number and moving on to the right.

Screen Shot 2015 12 10 at 18.09.30

The scheme above illustrates a tree described by all the rules of The Nested Tree Model. Numbers near nodes are the left and right keys in which are contained the main information about the node. Red numbers are the levels of nodes. If information about node keys is added to the database, working with the tree becomes easier. If you go through keys from 1 to 12 in your mind, you have gone through all tree nodes from left to right. When we use this structure of tree, it is easy for us to select particular nodes, sub-trees, parent sub-trees and others.

So SQL queries to populate fields, lft – left key, rgt – right key, lvl – level and select data from tables will look like this:

CREATE TABLE category (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL,
lvl INT NOT NULL,
);

INSERT INTO category VALUES
(1, ‘Databases’, 1, 12, 1),
(2, ‘Relational’, 2, 7, 2),
(3, ‘Document’, 8, 11, 2),
(4, ‘MySQL’, 3, 4, 3),
(5, ‘PostgreSQL’, 5, 6, 3),
(6, ‘MongoDB’, 9, 10, 3);

Retrieving a Full Tree
We can retrieve the full tree with ordering and levels (where number of *s is the level):

SELECT name, lvl FROM category ORDER BY lft;

Result:
*Databases
**Relational
***MySQL
***PostgreSQL
**Document
***MongoDB

Retrieving a Sub-tree Path:
SELECT name, lvl FROM category WHERE lft >= 2 AND rgt <= 7 ORDER BY lft;

Result:
**Relational
***MySQL

Retrieving a Parent Sub-tree Path:
SELECT name, lvl FROM category WHERE lft <= 2 AND rgt >= 7 ORDER BY lft;

Result:
*Databases
**Relational

Retrieving a Sub-Tree Path in which our node is contained:
SELECT name, lvl FROM category WHERE rgt > 2 AND lft < 7 ORDER BY lft;

Result:
*Databases
**Relational
***MySQL
***PostgreSQL

In this simple example we saw how using the nested set model makes our work with tree easy.

How does this get applied in Symfony Framework?

We’ve talked about methods of implementing hierarchical data, but we haven’t gone into a lot of depth. We haven’t spoken about algorithms for populating additional information on a tree when some of its nodes are added or removed. And we haven’t reviewed all the algorithms which give profit for select sub-tree data from a tree. In some models these are not at all easy tasks. Luckily we now live in the world of OpenSource, and there’s no need to reinvent the wheel. Some smart guys out there have already written convenient tools which resolve these exact problems. Doctrine2 behavioral extension DoctrineExtensions, for example, provides us with nested sets behavior and implements the tree models on Entities.

In Symfony Framework we can use StofDoctrineExtensionsBundle, which implements all the features of DoctrineExtensions. In our projects we’ve used parts of this bundle’s functions many times, including Sluggable, Timestampable and more. One other big job that it does, is provide us with Tree support in Symfony Entities. The bundle has good documentation and examples of use. It gives us the ability to select one of the tree model structures for our entity and easily implement it. So let’s work with Tree Hierarchical Structures in Symfony Framework, without giving ourselves any headaches!