Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

DBSCAN

Density-Base Spatial Clustering of Applications with Noise (DBSCAN)

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups together points that are close to each other in dense regions, and marks points in low-density areas as outliers (noise).
It is used when:
  • Clusters are of arbitrary shapes
  • Automatically detect the number of clusters
  • Data contains noise or outliers
Key Concepts
  1. Core Point: A point with at least min_samples neighbors within a distance eps.
  2. eps distance: minimum distance from core point to another point.
  3. min points: number of points close to given point such that the point is core point
  4. Density reachable and density connected: Density-reachable refers to a point being reached via a chain of core points within a specified radius, while density-connected means two points can be reached from a common core point, also within the specified radius.
  5. Border Point: A point within eps of a core point, but not a core point itself.
  6. Noise (Outlier): A point that is not a core point and not within eps of any core point.
Working of DBSCAN
  • For each point, find its eps-neighborhood (nearby points within a radius eps).
  • If it has greater than min_samples neighbors, it becomes a core point and starts forming a new cluster.
  • Expand the cluster by recursively adding:
  • Core points\' neighbors
  • Border points
  • Label points as noise that do not belong to any cluster.
Advantages
  • No need to specify the number of clusters (unlike K-Means).
  • Can find arbitrarily shaped clusters.
  • Automatically detects outliers.
  • Works well for spatial and noisy datasets.
Limitations
  • Not good with varying densities (some where very dense data, somewhere low dense data) in the same dataset.
  • Sensitive to parameter settings (eps, min_samples).
  • Not ideal for high-dimensional data (distance becomes less meaningful).