Jim Horn

Freelance Microsoft SQL Server database developer, architect, and trainer specializing in Business Intelligence, ETL, and Data Warehouse solutions. Full-time contract assignments, part-time consulting and mentoring, and technical article author and editor in the Minneapolis, Minnesota USA area or remote.  

*** Currently on contract assignment and only available for part time and after hours development ***

Click here for a current resume

 
 

T-SQL:  Normalized data to comma delineated string and back


Many times as a report developer I've been asked to display normalized data as a comma-separated string.  Here's how to do it. 

Let's start with some sample data


-- Create and populate tables used in this demo
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
   DROP TABLE customer
GO
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
   DROP TABLE customer_type
GO
-- Create the tables
CREATE TABLE customer_type(
   id int PRIMARY KEY NOT NULL, 
   name varchar(50)) 
CREATE TABLE customer (
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   customer_type_id int REFERENCES customer_type(id), 
   name varchar(50)) 
ALTER TABLE customer
ADD CONSTRAINT fk_customer_customer_type
FOREIGN KEY (customer_type_id) 
REFERENCES customer_type(id) 
INSERT INTO customer_type (id, name) 
VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')
INSERT INTO customer (customer_type_id, name ) 
VALUES 
   (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Back to the Future'),
   (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
   (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 
-- Show the tables
SELECT * FROM customer
SELECT * FROM customer_type

-- Create and populate tables used in this demo
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
   DROP TABLE customer
GO
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
   DROP TABLE customer_type
GO


-- Create the tables
CREATE TABLE customer_type(
   id int PRIMARY KEY NOT NULL, 
   name varchar(50)) 
CREATE TABLE customer (
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   customer_type_id int REFERENCES customer_type(id), 
   name varchar(50)) 


ALTER TABLE customer
ADD CONSTRAINT fk_customer_customer_type
FOREIGN KEY (customer_type_id) 
REFERENCES customer_type(id) 

-- Add data
INSERT INTO customer_type (id, name) 
VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')
INSERT INTO customer (customer_type_id, name ) 
VALUES 
   (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Back to the Future'),
   (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
   (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 

-- Show the tables
SELECT * FROM customer
SELECT * FROM customer_type

Normalized data to comma delineated string

For starters, here's a query that uses FOR XML PATH takes all column values in a table and returns a single string separated with a comma-space.

BUT this throws in a comma space in the beginning.  

Same query as above, with STUFF used to remove the first comma space.
The completed query, which is a subquery FOR XML PATH that is correlated on a main query of DISTINCT ct.type in order to return only the names for that type. 

 

Comma delineated string to normalized data

Now let's go back.Using the above tables and data let's create a new table with the above results, and just for kicks and giggles add a few more rows.

IF OBJECT_ID('tempdb..#c') IS NOT NULL
DROP TABLE #c
GO


CREATE TABLE #c (type varchar(100), names varchar(1000)) 
INSERT INTO #c (type, names) 
SELECT DISTINCT ct.type,
  STUFF(
         (SELECT ', ' + convert(varchar(50), c2.name)
          FROM customer c2
          WHERE c1.ct_id = c2.ct_id
          FOR XML PATH('')  -- SELECT it as XML
), 1, 2, '')   -- This is done to remove the first comma-space (,) FROM the result
as names
FROM customer c1
JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type


-- Add a few more just for kicks and giggles
INSERT INTO #c (type, Names) 
VALUES 
('Actors', 'Kevin Costner, Meg Ryan, Peter Weller'), 
('Athletes', 'Michael Jordan, Bo Jackson'), 
('Fictional Characters', 'Santa Clause, The Easter Bunny, The Tooth Fairy') 

 

-- Show the results
SELECT * FROM #c

So here's the new starting point

The final result.  Uses a Common Table Expression (CTE) that contains a correlated UNION ALL to return a single row for each type and name.

Name is determined by..

First CTE Select Statement (orange)

Column DataItem uses the #c table, Names column and CHARINDEX to get the LEFT-most characters from the beginning of the string to just before the comma. 

Column Data is the rest of the string. 

Second CTE Select Statement (blue)

Same actions as the first, but uses the CTE table tmp, making it recursive so that the tmp table is now populated with a row for each name in table #c.

Outer Query (Purple)

Simply calls the type and dataitem as names of the cte. 

 

The end.  Thank you for reading my article, and please email me at jim at jimhorn dot biz with any feedback.  I look forward to hearing from you. 

Jim  ( LinkedIn )  ( Twitter ) ( Experts Exchange ) ( Stack Overflow )


The entire T-SQL script is below.  

-- Create and populate tables used in this demo
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
   DROP TABLE customer
GO
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
   DROP TABLE customer_type
GO

-- Create the tables
CREATE TABLE customer_type(
   id int PRIMARY KEY NOT NULL, 
   type varchar(50)) 
CREATE TABLE customer (
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   ct_id int REFERENCES customer_type(id), 
   name varchar(50)) 


ALTER TABLE customer
ADD CONSTRAINT fk_customer_customer_type
FOREIGN KEY (ct_id) 
REFERENCES customer_type(id) 


INSERT INTO customer_type (id, type) 
VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')
INSERT INTO customer (ct_id, name ) 
VALUES 
   (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Marty McFly'),
   (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
   (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 

-- Show the tables
SELECT * FROM customer
SELECT * FROM customer_type

-- XML PATH returns a single row with all values, separated by comma space
SELECT ', ' + convert(varchar(50), name)
FROM customer
FOR XML PATH('')


-- STUFF removes the first comma space and formats this with ,
SELECT DISTINCT STUFF((
SELECT ', ' + convert(varchar(50), c2.name)
FROM customer c2
FOR XML PATH('')), 1, 1, '')
FROM customer c1


-- THIS WORKS
SELECT DISTINCT ct.type,
  STUFF(
         (SELECT ', ' + convert(varchar(50), c2.name)
          FROM customer c2
          WHERE c1.ct_id = c2.ct_id
          FOR XML PATH('') ), 1, 2, '') as names
FROM customer c1
JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type

-- Throw the above results into a temp table

IF OBJECT_ID('tempdb..#c') IS NOT NULL

   DROP TABLE #c GO

CREATE TABLE #c (type varchar(100), names varchar(1000)) 
INSERT INTO #c (type, names) 
SELECT DISTINCT ct.type,
  STUFF(
         (SELECT ', ' + convert(varchar(50), c2.name)
          FROM customer c2
          WHERE c1.ct_id = c2.ct_id
          FOR XML PATH('')  -- SELECT it as XML
), 1, 2, '')   -- This is done to remove the first comma-space (,) FROM the result
as names
FROM customer c1
JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type


-- Add a few more just for kicks and giggles
INSERT INTO #c (type, Names) 
VALUES 
('Actors', 'Kevin Costner, Meg Ryan, Peter Weller'), 
('Athletes', 'Michael Jordan, Bo Jackson'), 
('Fictional Characters', 'Santa Clause, The Easter Bunny, The Tooth Fairy') 
SELECT * FROM #c


-- THIS WORKS convert the set back to normalized
;with tmp(type, names1, names2) as (
SELECT 
type, 
CAST(LTRIM(LEFT(Names, CHARINDEX(',', Names + ',') -1 )) as varchar(100)),
CAST(STUFF(Names, 1, CHARINDEX(',', Names + ','), '') as varchar(1000))
FROM #c
UNION ALL
SELECT 
type, 
CAST(LTRIM(LEFT(names2, CHARINDEX(',', names2 + ',') -1)) as varchar(100)),
CAST(STUFF(names2, 1, CHARINDEX(',', names2 + ','), '') as varchar(1000))
FROM tmp
where names2 > ''
)
SELECT type, names1 as names
FROM tmp
ORDER BY type

 
 

 

 
 

Email jim@jimhorn.biz, Cell 612.910.5236, Twitter @sqljimbo