
The division gathers the kind also to be called the division to gather the kind，Assigns m data objects or Yuan Zu's database .A division method constructs the data k division Each division expressed gathers the bunch. And K m, Namely it the data division is k group, Simultaneously satisfies the following request, (1)Each group at least contains an object,(2) Each object must belong to also only belong to a group. Each object must belong to also only belong to a group, simultaneously in certain fuzzy division technologies the second request may relax.
Assigns division number which must construct K. The division method first founds an initial division, then uses one kind of iteration the heavy localization technology, attempts moves through the object in the division improves the division, A good division general criterion is: Forms will gather the kind to cause an objective division standard (often to be called likelihood function) the optimization, thus will cause will gather a kind of center object is "similar", but differently gathers the kind of center object will be "is not similar". At present two quite popular heuristic divisions methods(1)k Average algorithm, Assigns kind of integer k objects minute to n In a kind, causes between a kind of in object the similarity biggest, but a kind of between similarity is smallest. Similar basis for calculation gathers the kind of center object mean value (to regard as to gather a kind of center) to carry on, namely each bunch with this bunch of center object mean value expressed.(2)K Center Algorithm In this algorithm, the tentative plan uses medoids; Took a reference point replaces K=means In the algorithm respectively gathers a kind of average value, thus may act according to various objects with various reference points between sum of smallest principle distance, continues to apply divides the method, namely each bunch with close gathers a kind of central object to express. These heuristic gather a kind of method to in the center small scale database to discover the spherical bunch is suitable very much. In order to carries on to the largescale data set gathers the kind, as well as the processing complex shape gathers the kind, needs the further expansion based on the division method.
hierarchical method
The level gathers a kind of algorithm The level gathers a kind of algorithm is divides into many levels to the database, then to different level The inferior data uses the division to gather the kind, the output is an level classification tree. Level The inferior method may divide into the condensation and the fission. Condensation method, also is called from The bottom upward method, first each object took a bunch, then the merge is close The atomic bunch is the more and more big bunch, until all objects all in a bunch, Or some end condition is satisfied. Fission method, also is called from the top The method, first will possess the object to set in a bunch, gradually will subdivide for is more and more small The bunch, from becomes a bunch until each object, or has achieved some end strip . Overwhelming majority level gathers a kind of method to belong to this kind, they are only in the bunch The similar definition has differently. The level method flaw lies in, the regular meeting meets the merge or minute choice sleepiness Difficult, such decision is extremely essential, because once a step (merge or minute The crack) completes, next step of processing will carry on in on the new production bunch, has done processing Cannot abolish, gathers between the kind not to be able to exchange the object. If in some step does not have Some chooses the merge or the fission decision well, possibly can cause the low quality to gather the kind Finally, also this kind gathers a kind of method not to have the very good expandability. This strict gauge Decides is useful, because Don't worry the combination number goal different choice, computation price Can be smaller. But, a this technical main question is it cannot correct wrongly Decision. Some two methods may improve the level to gather a kind of result: In each division Center, carefully analyzes between the object "the joint". If CURE algorithm. The CURE algorithm elects Selects based on the nature heart and based on represent between the object method the middle strategy, chooses the data In the space the fixed quantity goal has the representative spot, each kind has are many to a generation The table spot causes the CURE algorithm to be allowed to adapt the non sphere geometry shape; Kind of contraction Or the condensation may be helpful to the control influence. Comprehensive level condensation sum Iteration heavy localization method. First selects from the bottom upward level algorithm, then again Improves the achievement with the iteration heavy localization method, like BIRCH algorithm. BIRCH calculated The law is an integrated level gathers a kind of method, first uses the tree structure to carry on the level to the object A division, then uses other gathers a kind of algorithm to gather a kind of result to carry on asks the essence. This In the algorithm has introduced two concepts: Gathers a kind of characteristic and gathers the kind of characteristic tree. This algorithm has to the object number linear elasticity, or dynamic gathers the kind to the increase also extremely to have The effect, uses the multi stages to gather a kind of technology, the data acquisition single scanning produces one is basic Gathers kind of or many extra scan may improve gathers a kind of quality. Its computing time is complex Is O(n). But it because has used the radius or the diameter concept controls gathers a kind of side But cannot the very good work. 
