Volume: 15, Issue: 3(2005)
pp. 239-260 DOI: 10.1142/S0218195905001683
|
|
Abstract |
Full Text (PDF, 2,229KB)
|
References
|
 |
| Title: |
GEOMETRIC ALGORITHMS FOR DENSITY-BASED DATA CLUSTERING |
| Author(s): |
DANNY Z. CHEN
The work of these authors was supported in part by Lockheed Martin Corporation and
by the National Science Foundation under Grant CCR-9988468. Department of Computer Science and Engineering,
University of Notre Dame, Notre Dame, Indiana 46556, USAMICHIEL SMID
The work of this author was supported in part by NSERC. School of Computer Science,
Carleton University, Ottawa, Ontario, Canada K1S 5B6, CanadaBIN XU
The work of these authors was supported in part by Lockheed Martin Corporation and
by the National Science Foundation under Grant CCR-9988468.
Corresponding author. Department of Computer Science and Engineering,
University of Notre Dame, Notre Dame, Indiana 46556, USA
|
| History: |
Received 12 February 2003 Revised 30 March 2005
|
| Abstract: |
Data clustering is a fundamental problem arising in many practical
applications. In this paper, we present new geometric approximation and
exact algorithms for the density-based data clustering problem
in d-dimensional space ℝd (for any constant
integer d ≥ 2). Previously known algorithms for this problem
are efficient only when the specified range around each input point, called
the δ-neighborhood, contains on average a constant number of
input points. Different distributions of the input data points have significant
impact on the efficiency of these algorithms. In the worst case when the data
points are highly clustered, these algorithms run in quadratic time,
although such situations might not occur very frequently on real data.
By using computational geometry techniques, we develop
faster approximation and exact algorithms for the density-based data
clustering problem in ℝd. In particular, our
approximation algorithm based on the ∊-fuzzy distance function
takes O(n log n) time for any given fixed value
∊>0, and our exact algorithms take sub-quadratic
time. The running times and output quality of our algorithms do not
depend on any particular data distribution. We believe that our fast
approximation algorithm is of considerable practical importance, while
our sub-quadratic exact algorithms are more of theoretical interest.
We implemented our approximation algorithm and the experimental results show
that our approximation algorithm is efficient on arbitrary input point sets. |
| Keywords: |
Density-based clustering; geometric algorithms; approximation algorithms; range search; nearest neighbor search; BBD trees; graph traversal
|
|
|