The Problem


Today one of my most talented developers spent the morning putting together yet another address form and it set me off wondering why, as developers, we seem to spend so much time writing out the same methods, concepts and programs; faff around with the same config files getting them to do the same thing and bang our heads against the wall trying to achieve the same system & server setup to run our code over and over and over again.


Assumptions

If a developer were to start a new development project today, what is it likely to look like?
  • It's probably a Web Project
  • It's probably hosted on a server with url pointed at it
  • It's probably going to need source control and backup
  • It's probably going to need a live version and one or two staging versions
  • It's going to need documenting
  • Generally speaking, it's probably going to consist of a series of common objects which can be both stored in a data store and represented on a page with get, create, edit, update and delete methods
  • More specifically, it's probably going to need a system of user authentication, with logins, registration, passwords, forgot password functions, etc.
  • it's going to need the ability to create groups for users and to grant and deny those groups of users access certain pages and resources
  • It's probably going to need outbound (and possibly inbound) email and if so it'll need a message queueing system
  • It's probably going to need controls for collecting commonly required data sets (and the ability to store them in the usual way)
    • Names
    • Dates
      • date of birth
      • start date and end date
    • Addresses (one or multiple)
    • Card Payment information (incorporating addresses)
    • Captchas
    • Blog-style posts
    • Comments
    • Events
    • Geo-locations
  • Input fields are going to need client side and server side validation
  • Should be able to collect and report on traffic statistics
  • It's probably going to need the ability to interface with commonly used frameworks
    • OpenID
    • OAuth
  • It's probably going to need the ability to interface with at least one or two of the more commonly used APIs
    • Facebook
    • Twitter
    • Windows Live (maybe)
    • Google Adwords
    • Google Adsense
    • Google Base
    • Google Maps
    • Youtube
    • Amazon ecommerce
    • Google Analytics
    • Flickr
    • Sharethis
  • Probably going to need to integrate with a payment service too
    • Paypal
    • Worldpay
    • Google Checkout
    • a few banks...
I'm just scratching the surface there, but you get the idea. These are all components that we shouldn't have to write; tweak maybe, extend certainly, but we shouldn't have to be writing these from scratch over and over.

The Solution?

So here's my vision: A programming environment which is itself web-based, the software is installed on the server (or even better, is pre-installed on the server) and all administration, setup and programming can be done via a web interface. It handles source control, backup, documentation, data storage, publishing (live versions and test versions - this should be really simple since the software is already online), access control and server setup through this interface; it does away with the idea of a separate database and programming environment, handling all data storage through the same interface to minimize time-consuming configuration and complications.

It should be able to handle all the basic data types and also provide objects representing the common collections of those data types (Names, dates of birth, Addresses, Events, etc.) and be easy to extend them, create new ones and share them.
In fact, the idea of sharing and integrating objects and code with others should be central to the design, so as well as providing a range of reusable objects like those listed in the assumptions above, it should be trivially easy to post a component online and for others to include it in their project (whilst maintaining proper attribution, of course).

Don't get me wrong, I can see that there are movements in this direction, ASP.Net provides ready made controls, simplifies the process of creating objects from your database, etc.; Ruby on Rails follows a philosophy similar to what I'm proposing, helping with the setting up of your database and corresponding business layer objects and utilizing syntax which minimizes the amount of code that needs to be written, and (although I haven't had a chance to play with it yet) I understand Python also offers an approach which is also very high level; however I can't say I've found a programming language which completely solves the problem of writing the same code over and over.


Simplifying and streamlining software development also helps address the problem of software creation being alien to most people, even though they use software all the time. But most importantly, by making software development less repetitive and laborious we can achieve greater productivity and get on with the more interesting jobs, you know, the ones where you actually have to think.
There aren't many things that make me feel warm and fuzzy inside, but Recursion and SQL are two of them. It was back in 2004 when, young and naive, I first had need of a technique for recursively querying a tree of data in SQL. It was what is probably the most common scenario requiring this technique, given a Product Category ID, return it and the IDs of all child categories so that I could look up the corresponding Products. After much research the technique I hit upon was to get a Stored Procedure with the help of a Cursor to execute itself for each level of recursion, then store the results in a temporary table.

While this worked fine at the time, when I looked at it again recently to address a similar problem, I recoiled in shock at how horribly slow and inefficient it was - unsurprising really. Temporary tables are horribly slow to create and query, and as a rule of thumb, whenever you're using a Cursor in a SQL query, alarm bells should ring, although occasionally necessary it tends to be a sign that you're approaching your query with a function-based programming mindset rather than query language.

So after a bit of research I found Common Table Expressions (CTE) - a feature added to Microsoft SQL Server 2005 (pretty sure I've seen it in Oracle SQL too) which offers an efficient way to create a named result set which can be referenced.
Since CTEs can be referenced, they can reference themselves, which means they can be recursive, and here's how it's done:

A recursive CTE is basically structured in this form:

WITH MyResultSetName (my, result, fields) AS
(
/* a select statement which gets our base query data */
UNION ALL
/* a select statement which gets the next level of recursion */
)

then we select from our result set

SELECT
*
FROM
MyResultSetName

Looks simple enough, how about an example...

Create our example table:

CREATE TABLE [dbo].[Category](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](50) NULL,
[parentId] [int] NULL,
CONSTRAINT [PK_Category] PRIMARY KEY CLUSTERED
(
[id] ASC
) ON [PRIMARY]
) ON [PRIMARY]

GO

Add our example data (a category tree):

INSERT INTO [Category] ([name],[parentId]) VALUES('All Products',null);
INSERT INTO [Category] ([name],[parentId]) VALUES('Televisions',1);
INSERT INTO [Category] ([name],[parentId]) VALUES('Laundry',1);
INSERT INTO [Category] ([name],[parentId]) VALUES('Small Appliances',1);
INSERT INTO [Category] ([name],[parentId]) VALUES('Plasma TVs',2);
INSERT INTO [Category] ([name],[parentId]) VALUES('LCD TVs',2);
INSERT INTO [Category] ([name],[parentId]) VALUES('Projectors',2);
INSERT INTO [Category] ([name],[parentId]) VALUES('Washing Machines',3);
INSERT INTO [Category] ([name],[parentId]) VALUES('Tumble Dryers',3);
INSERT INTO [Category] ([name],[parentId]) VALUES('Washer Dryers',3);
INSERT INTO [Category] ([name],[parentId]) VALUES('Toasters',4);
INSERT INTO [Category] ([name],[parentId]) VALUES('Breadmakers',4);
INSERT INTO [Category] ([name],[parentId]) VALUES('Blenders',4);
INSERT INTO [Category] ([name],[parentId]) VALUES('40"-50"',5);
INSERT INTO [Category] ([name],[parentId]) VALUES('51"-65"',5);
INSERT INTO [Category] ([name],[parentId]) VALUES('66"+',5);
INSERT INTO [Category] ([name],[parentId]) VALUES('14"-22"',6);
INSERT INTO [Category] ([name],[parentId]) VALUES('23"-40"',6);
INSERT INTO [Category] ([name],[parentId]) VALUES('42"+',6);
INSERT INTO [Category] ([name],[parentId]) VALUES('1000-2000 Lumens',7);
INSERT INTO [Category] ([name],[parentId]) VALUES('2000-3000 Lumens',7);
INSERT INTO [Category] ([name],[parentId]) VALUES('3000+ Lumens',7);
INSERT INTO [Category] ([name],[parentId]) VALUES('1000 Spin',8);
INSERT INTO [Category] ([name],[parentId]) VALUES('1200 Spin',8);
INSERT INTO [Category] ([name],[parentId]) VALUES('1400 Spin',8);

and now our example CTE:

/* SET UP THE TARGET CATEGORY ID FOR THIS TEST (SET THIS TO WHICHEVER CATEGORY ID YOU WANT)... */
DECLARE @categoryId int;
SET @categoryId = 2;


--AND HERE'S THE CTE...
WITH CategoryTree (id, name, parentId) AS
(
--THE BASE QUERY...
SELECT
id,
name,
parentId
FROM
Category
WHERE
id = @categoryId
UNION ALL
--THE RESURSIVE QUERY...
SELECT
c.parentId
FROM
Category c
INNER JOIN CategoryTree CT -- JOIN TO THE RESULTS IN THE CTE
ON c.parentId = CT.id
)

--FINALLY, SELECT ALL FROM THE RESULTS IN THE CTE
SELECT
id,
name,
parentId
FROM
CategoryTree

And here are the results of the query:

And there we have it, the joyous union of those two most wonderful things... Recursion and SQL, and best of all, it's FAST!

More on Recursive Queries Using Common Table Expressions here:
top