Fuzzy C-means based clustering for linearly and nonlinearly

separable data

 

ABSTRACT

 

The fuzzy C-means (FCM) algorithm is one of the most popular techniques used for clustering. The conventional FCM method uses the Euclidean distance as the similarity criterion that makes FCM only suitable for clustering hyper-spherically distributed data groups. It thus may degrade the performance of FCM for data points with uneven densities or non-hyperspherical shapes in individual clusters. In this study, we present a new distance metric that incorporates the distance variation in a cluster to regularize the distance between a data point and the cluster centroid.

 

The conventional FCM method only works for linearly separable data points. The proposed distance metric is applied to both the Fuzzy C-means clustering in data space and kernel fuzzy C-means clustering in feature space. We use the RBF (radial basis function) kernel to implicitly define the mapping function from data space to feature space so that nonlinear separation of clusters can be achieved. An adaptive bandwidth setting for an RBF kernel is also proposed. It eliminates the blind guess of the RBF parameter value for unlabeled data sets.

 

Experiments on two-dimensional artificial data sets, real data sets from public data libraries and color image segmentation have shown that the proposed FCM and KFCM with the new distance metric generally have better performance on non-spherically distributed data with uneven density for linear and nonlinear separation.

 

Keywords: Fuzzy C-means, Clustering, Distance metric, Nonlinear separation, Color image segmentation