Cluster Analysis with R

Recently in one of my classes, we’ve been experimenting with using R for running cluster analysis. Clustering is a way of assigning rows or records (or respondents, in the end) into groups. The person running the analysis is in control of how many groups they want to create, but depending on the data and number of rows, you’ll usually be setting up no more than 10 groups. At any rate, this is an extremely powerful tool because it allows the people in your organization to determine what your customers have in common, which allows for segmentation and a tailored approach to working with your customers. This isn’t just a greedy way of finding how to squeeze more money out of people, it can be used to create different products, user experiences, subscription packages, or any other offering that ultimately makes the relationship you have with your customer just a little bit better.

I’ve been learning a bit of R lately, as my class requires us to use it, and I can tell that it’s not only extremely powerful but has a lot of depth. I haven’t even scratched the surface. I’ve been using an IDE called RStudio. RStudio allows you to view the command line (if you use R out of the box, all it will do is open a command prompt window), create your script (the script I’m using is modified from one we used in class, I’m not that good yet :)), view all of the variables you’ve assigned and all of the data you’re using, and finally a window where you can view your plots or any outputs you’ve asked R to create. It’s quite fancy.


The data we used was fake demographic data that a fake automobile company had collected. Our assignment was to divide the respondents into as many clusters as we thought was appropriate (I chose 3, but you could certainly make a case for 4), and then provide recommendations to the fake marketing manager on how this information could be used to better market to our customers.

Cluster analysis is a lot of fun and should definitely be a tool in every marketer’s toolbox. That being said, it is completely subjective and you could make a case for as many or as few clusters of customers as you’d like. Also, it’s important to remember that even though it’s convenient for you to assign individuals to groups, you’ll still need to deal with people on an individual level and train your staff to not segment people or make assumptions just because someone fits your model. This all reminds me of an article we read for our class about this mathematics Ph.D. that applied clustering to his dating life. Sometimes, you can overthink things.

As always, if you want to view (and make changes or recommendations) my script or the raw data, you can do that at GitHub.

Please follow and like us: