Suppose you have the following table:

It looks like this:

001_splitting_rows

But you want it to look like this:

How do you fix this using SQL?

Let’s start by figuring out the number of commas in each of the PEOPLE fields.  We do this by calculating the length of the field, removing all commas, calculating the length of the new field, and taking the difference of the two lengths.

Perfect.  We now know the number of people in each list (# of commas + 1).  So let’s use a recursive CTE, and create an index for each name.

003_splitting_rows

The new NAME_INDEX is zero-based, just like an array in most whatever programming language you’ve dealt with.

Now, our objective is to substring out the a name in each row.  So first, let’s figure out where we want to start substringing.

Firstly, if our NAME_INDEX = 0 (meaning we’re looking for the very first name in the list) we don’t want to remove anything from the start of the string, so we say ‘if NAME_INDEX = 0, then just start substringing at the beginning of the field.’  If, however, NAME_INDEX is not equal to 0, then we use the INSTR() function to search for the next comma.  Let’s digest the INSTR() function–the first argument, PEOPLE, is the field we’re searching a comma for.  The argument, ','  is saying that we’re searching for a comma.  The third argument, 1, says “start looking from the very beginning of the string,” and the fourth argument, NAME_INDEX, says let’s look for the <NAME_INDEX>nth occurrence of a comma.

Next, let’s worry about determining how we’re going to figure out where to END the substring. We do this by searching for the next comma after the one we just found.

But, we don’t actually want to use the position of that character, because that’s the actual comma.  We want the position of the very last character that’s right before the comma.

Now, if we’re looking at the very last name in the group, there will be no trailing comma…so the INSTR() function will return 0, and if we subtract -1 from it, then we don’t actually want to use -1.  We just want to use the full length of the string instead.  So most people would write it like this:

That’s a very explicit way of saying what’s going on, but it’s long and redundant, so I’m going to shorten it slightly:

We start by calculating the usual instr(…) – 1 function.  We use NULLIF() to null the value out if it’s -1, meaning COALESCE() will skip over the first value in the expression and use the second, which is the LENGTH() of the string.

So we add our newly created field back into our select:

Great!  Now we’re ready to apply SUBSTR().  We’ll nest the existing select in an inline view.

Now we’re ready to apply SUBSTR.  Except, SUBSTR() doesn’t accept start and end positions like we have.  Substr accepts a start position and a length.  So we already have the start position, but to get the length, we need to do some subtraction.  Plus I added one.