Clustering (aka cluster analysis) is defined as the task of grouping a set of objects so that objects in the same group are more similar to each other than to objects in other groups (following this definition). It is thus a field of unsupervised learning techniques.
Clustering has a variety of applications. Often it is used for thinning out data by replacing similar samples with a representative cluster, e.g. represented by the mean value. It can also be used to find well-distributed points in a feature space, which can be used by more advanced algorithms e.g. as an initialization.
- [slides] Clustering for unsupervised Classification
- [notebook] Clustering with the Expectation-Maximization algorithm
- [notebook] Clustering with the k-means algorithm