Included with this assignment is an Excel spreadsheet that contains data with two dimension values.
The purpose of this assignment is to demonstrate steps performed in a K-Means Cluster analysis.
Review the “k-MEANS CLUSTERING ALGORITHM” section in Chapter 4 of the Sharda et. al. textbook for additional background.
Use Excel to perform the following data analysis.
- Plot the data on a scatter plot.
- Determine the ideal number of clusters.
- Choose random center points (centroids) for each cluster. (Note: Each student will select a different random set of centroids.)
- Using a standard distance formula measure the distance from each data point to each center point.
- Assign each data point to an initial cluster region based on closeness.
- For each cluster calculate new center points.
- Repeat steps 4 through 6.
You will use Excel to help with calculations, but only standard functions should be used (i.e. don't use a plug-in to perform the analysis for you.) You need to show your work doing this analysis the long way. If you were to repeat steps 4 through 6, what will likely happen with the cluster centroids? The rubric for this assignment can be viewed when clicking on the assignment link.
Here is a link to an example spreadsheet using a smaller data set. It contains two tabs. The first tab is the raw data. The second tab contains the analysis that was performed. Make sure that you use a different starting center points from the example.