# K-mean
An iterative clustering technique. Initiated from random K points (means)
Assignment: assign each point to the cluster with nearest distance to the means
Update: calculate the mean of all points in a cluster
Iterate until no more changes in assignment. May not be optimal.
#Fuzzy clustering (Fuzzy C-Means)
Fuzzy clustering: The category of each sample can be partially to all classes, but combined to be 1. Fuzzy C-Means, considered as a fuzzy version of K-mean. Similar iterative process.
# K-medoids
Instead of using mean of cluster in K-mean, in K-medoids, select one of the points in each cluster.
Lower requirements from data, no need to be in the same Euclidean space, can also be categorical values. Can construct a dissimilarity matrix to define distance for categorical points.
Pro: Avoid impacts from outliers, as the selected point has to be one of the points in the cluster
Con: Time complexity O(N^2)
# Gaussian Mixture Model (GMM)
Consider the points distribution is from multiple Gaussian probability linearly add together.
Rather than assigning point to clusters. GMM gives the probability of points assigning to each cluster.
For each Gaussian k, \pi_k, \mu_k, \Sigma_k need to be estimated using sampled data. Expectation-Maximum method is used for finding those parameters.
Step 1 (soft assignment): for each data x_i, the probability of it is generated by the k_th Gaussian component
\displaystyle \gamma(i, k) = \frac{\pi_k \mathcal{N}(x_i|\mu_k, \Sigma_k)}{\sum_{j=1}^K \pi_j\mathcal{N}(x_i|\mu_j, \Sigma_j)}
It can also be considered as the amount of k_th component contributing to this data point, or x_i has \gamma(i, k)x_i generated from component k.
Step 2 (parameter estimation): Now consider the data in each k_th Gaussian is \gamma(1, k)x_1, \ldots, \gamma(N, k)x_N, using maximum likelihood estimation, the parameters can be estimated using
\displaystyle \begin{aligned} \mu_k & = \frac{1}{N_k}\sum_{i=1}^N\gamma(i, k)x_i \\ \Sigma_k & = \frac{1}{N_k}\sum_{i=1}^N\gamma(i, k)(x_i-\mu_k)(x_i-\mu_k)^T \end{aligned}
N_k = \sum_{i=1}^N \gamma(i, k)
\pi_k = N_k/N
Iterate until convergence.
Pros: Not only able to cluster, but also used for density estimation (probability a data point labeled as certain class).
Cons: Large calculation (usually k-mean to roughly cluster before using GMM). Not ensure optimal.
Adapt from: http://blog.pluskid.org/?page_id=78