There aren't many things that make me feel warm and fuzzy inside, but Recursion and SQL are two of them. It was back in 2004 when, young and naive, I first had need of a technique for recursively querying a tree of data in SQL. It was what is probably the most common scenario requiring this technique, given a Product Category ID, return it and the IDs of all child categories so that I could look up the corresponding Products. After much research the technique I hit upon was to get a Stored Procedure with the help of a Cursor to execute itself for each level of recursion, then store the results in a temporary table.

While this worked fine at the time, when I looked at it again recently to address a similar problem, I recoiled in shock at how horribly slow and inefficient it was - unsurprising really. Temporary tables are horribly slow to create and query, and as a rule of thumb, whenever you're using a Cursor in a SQL query, alarm bells should ring, although occasionally necessary it tends to be a sign that you're approaching your query with a function-based programming mindset rather than query language.

So after a bit of research I found Common Table Expressions (CTE) - a feature added to Microsoft SQL Server 2005 (pretty sure I've seen it in Oracle SQL too) which offers an efficient way to create a named result set which can be referenced.
Since CTEs can be referenced, they can reference themselves, which means they can be recursive, and here's how it's done:

A recursive CTE is basically structured in this form:

WITH MyResultSetName (my, result, fields) AS
(
/* a select statement which gets our base query data */
UNION ALL
/* a select statement which gets the next level of recursion */
)

then we select from our result set

SELECT
*
FROM
MyResultSetName

Looks simple enough, how about an example...

Create our example table:

CREATE TABLE [dbo].[Category](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](50) NULL,
[parentId] [int] NULL,
CONSTRAINT [PK_Category] PRIMARY KEY CLUSTERED
(
[id] ASC
) ON [PRIMARY]
) ON [PRIMARY]

GO

Add our example data (a category tree):

INSERT INTO [Category] ([name],[parentId]) VALUES('All Products',null);
INSERT INTO [Category] ([name],[parentId]) VALUES('Televisions',1);
INSERT INTO [Category] ([name],[parentId]) VALUES('Laundry',1);
INSERT INTO [Category] ([name],[parentId]) VALUES('Small Appliances',1);
INSERT INTO [Category] ([name],[parentId]) VALUES('Plasma TVs',2);
INSERT INTO [Category] ([name],[parentId]) VALUES('LCD TVs',2);
INSERT INTO [Category] ([name],[parentId]) VALUES('Projectors',2);
INSERT INTO [Category] ([name],[parentId]) VALUES('Washing Machines',3);
INSERT INTO [Category] ([name],[parentId]) VALUES('Tumble Dryers',3);
INSERT INTO [Category] ([name],[parentId]) VALUES('Washer Dryers',3);
INSERT INTO [Category] ([name],[parentId]) VALUES('Toasters',4);
INSERT INTO [Category] ([name],[parentId]) VALUES('Breadmakers',4);
INSERT INTO [Category] ([name],[parentId]) VALUES('Blenders',4);
INSERT INTO [Category] ([name],[parentId]) VALUES('40"-50"',5);
INSERT INTO [Category] ([name],[parentId]) VALUES('51"-65"',5);
INSERT INTO [Category] ([name],[parentId]) VALUES('66"+',5);
INSERT INTO [Category] ([name],[parentId]) VALUES('14"-22"',6);
INSERT INTO [Category] ([name],[parentId]) VALUES('23"-40"',6);
INSERT INTO [Category] ([name],[parentId]) VALUES('42"+',6);
INSERT INTO [Category] ([name],[parentId]) VALUES('1000-2000 Lumens',7);
INSERT INTO [Category] ([name],[parentId]) VALUES('2000-3000 Lumens',7);
INSERT INTO [Category] ([name],[parentId]) VALUES('3000+ Lumens',7);
INSERT INTO [Category] ([name],[parentId]) VALUES('1000 Spin',8);
INSERT INTO [Category] ([name],[parentId]) VALUES('1200 Spin',8);
INSERT INTO [Category] ([name],[parentId]) VALUES('1400 Spin',8);

and now our example CTE:

/* SET UP THE TARGET CATEGORY ID FOR THIS TEST (SET THIS TO WHICHEVER CATEGORY ID YOU WANT)... */
DECLARE @categoryId int;
SET @categoryId = 2;


--AND HERE'S THE CTE...
WITH CategoryTree (id, name, parentId) AS
(
--THE BASE QUERY...
SELECT
id,
name,
parentId
FROM
Category
WHERE
id = @categoryId
UNION ALL
--THE RESURSIVE QUERY...
SELECT
c.parentId
FROM
Category c
INNER JOIN CategoryTree CT -- JOIN TO THE RESULTS IN THE CTE
ON c.parentId = CT.id
)

--FINALLY, SELECT ALL FROM THE RESULTS IN THE CTE
SELECT
id,
name,
parentId
FROM
CategoryTree

And here are the results of the query:

And there we have it, the joyous union of those two most wonderful things... Recursion and SQL, and best of all, it's FAST!

More on Recursive Queries Using Common Table Expressions here:

0 comments:

Post a Comment

top