Smaller Coresets for k Median and k Means Clustering
15 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Smaller Coresets for k Median and k Means Clustering

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
15 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Smaller Coresets for k-Median and k-Means Clustering? Sariel Har-Peled† Akash Kushal‡ November 29, 2004 Abstract In this paper, we show that there exists a (k, ?)-coreset for k-median and k-means clustering of n points in IRd, which is of size independent of n. In particular, we con- struct a (k, ?)-coreset of size O(k2/?d) for k-median clustering, and of size O(k3/?d+1) for k-means clustering. 1 Introduction Clustering is a widely used technique in Computer Science with applications to unsupervised learning, classification, data mining and other fields. We study two variants of the clustering problem in the geometric setting. The geometric k-median clustering problem is the follow- ing: Given a set P of n points in IRd, compute a set of k points (i.e., medians) such that the sum of the distances of the points in P to their respective nearest median is minimized. The k-means differs from the above in that instead of the sum of distances, we minimize the sum of squares of distances. Interestingly the 1-mean is the center of mass of the points, while the 1-median problem, also known as the Fermat-Weber problem, has no such closed form.

  • such line

  • ?opt

  • clustering

  • problem has

  • smaller coresets

  • resulting batches

  • line into


Sujets

Informations

Publié par
Nombre de lectures 6
Langue English

Extrait

SmallerCoresetsfork-Medianandk-MeansClusteringSarielHar-PeledAkashKushalNovember29,2004AbstractInthispaper,weshowthatthereexistsa(k,ε)-coresetfork-medianandk-meansclusteringofnpointsinIRd,whichisofsizeindependentofn.Inparticular,wecon-structa(k,ε)-coresetofsizeO(k2d)fork-medianclustering,andofsizeO(k3d+1)fork-meansclustering.1IntroductionClusteringisawidelyusedtechniqueinComputerSciencewithapplicationstounsupervisedlearning,classification,dataminingandotherfields.Westudytwovariantsoftheclusteringprobleminthegeometricsetting.Thegeometrick-medianclusteringproblemisthefollow-ing:GivenasetPofnpointsinIRd,computeasetofkpoints(i.e.,medians)suchthatthesumofthedistancesofthepointsinPtotheirrespectivenearestmedianisminimized.Thek-meansdiffersfromtheaboveinthatinsteadofthesumofdistances,weminimizethesumofsquaresofdistances.Interestinglythe1-meanisthecenterofmassofthepoints,whilethe1-medianproblem,alsoknownastheFermat-Weberproblem,hasnosuchclosedform.Assuchtheproblemshaveusuallybeenstudiedseparatelyfromeachotherevenintheapproximatesetting.Thebasicquestionunderlyingapproximationalgorithms,iswhatportionofthedataisnecessarytocompute(approximately)acertainquantity.Thesmallerthisportionis,themoreefficienttheresultingalgorithmwouldbe.Acoresetisasmallportionofthedata,suchthatrunningaclusteringalgorithmonit,generatesaclusteringforthewholedata,whichisapproximatelyoptimal.Inparticular,asmallcoresetindicatesthataproblemiseasytoapproximate.Furthermore,itimpliesthatonecansummarizeandsketchthedataefficiently.Thisisusefulfordatabaseapplications,whereonecanstoresuchsketchesefficiently,andperformefficientclusteringonadatabase,orportionsofitusingthesketches.Seehttp://www.uiuc.edu/˜sariel/papers/04/smallcoreset/forthemostrecentversionofthispaper.DepartmentofComputerScience;UniversityofIllinois;201N.GoodwinAvenue;Urbana,IL,61801,USA;sariel@uiuc.edu;http://www.uiuc.edu/~sariel/.WorkonthispaperwaspartiallysupportedbyaNSFCAREERawardCCR-0132901.DepartmentofComputerScience;UniversityofIllinois;201N.GoodwinAvenue;Urbana,IL,61801,USA;kushal@uiuc.edu;http://www-cvr.ai.uiuc.edu/~kushal/.1
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents