Title: | "MST-Based Clustering" |
---|---|
Description: | Implements a minimum-spanning-tree-based heuristic for k-means clustering using a union-find disjoint set and the algorithm in Kruskal (1956) <doi:10.1090/S0002-9939-1956-0078686-7>. |
Authors: | Kevin Michael Frick <[email protected]> |
Maintainer: | Kevin Michael Frick <[email protected]> |
License: | AGPL (>= 3) |
Version: | 1.0.0.0 |
Built: | 2025-02-17 05:15:34 UTC |
Source: | https://github.com/cran/mstclustering |
Find the set an element belongs to.
find.set(i, ufds)
find.set(i, ufds)
i |
The element to check. |
ufds |
A data.table representing a UFDS. |
An integer: the root node of the set the element belongs to.
Generate an adjacency list
gen.child.list.mst(clust.edge.list, m)
gen.child.list.mst(clust.edge.list, m)
clust.edge.list |
The return value of the kruskal() function. |
m |
Number of nodes. |
An adjacency list in the form of a list of vectors.
Generate edge list from a distance matrix Duplicates are not deleted, because they will not be counted by Kruskal's algorithm If a check is O(1), this only adds an O(E) overhead, which is negligible
gen.edge.list(mat)
gen.edge.list(mat)
mat |
The distance matrix. |
A data frame with three columns: 'from', 'to', 'dist'.
Check if two elements are in the same set
is.same.set(i, j, ufds)
is.same.set(i, j, ufds)
i |
The first element in the tuple. |
j |
The second element in the tuple. |
ufds |
A data.table representing a UFDS. |
TRUE if the elements are in the same set, FALSE otherwise.
Kruskal's algorithm for MST computation.
kruskal(edge.list, m)
kruskal(edge.list, m)
edge.list |
A data frame with columnns 'from', 'to', 'dist'. |
m |
Number of nodes. |
A list of edges in the MST, in the same format as the input argument edge.list.
Run clustering using MST. Before calling this function, remove some edges from the MST, for example the k-1 heaviest.
mst.cluster(child.list.mst, m, k)
mst.cluster(child.list.mst, m, k)
child.list.mst |
The return value of the gen.child.list.mst() function with k-1 edges removed. |
m |
Number of nodes. |
k |
The number of clusters. |
A vector whose k-th element is the cluster the k-th point belongs to.
iris.clean <- iris[,-5] iris.dist <- as.matrix(dist(iris.clean)) iris.edge.list <- gen.edge.list(iris.dist) m <- nrow(iris.dist) iris.mst.edge.list <- kruskal(iris.edge.list, m) k <- 3 n.edges <- nrow(iris.mst.edge.list) iris.mst.edge.list <- iris.mst.edge.list[1:(n.edges - (k - 1)),] iris.child.list.mst <- gen.child.list.mst(iris.mst.edge.list, m) iris.clust.mst <- mst.cluster(iris.child.list.mst, m, k)
iris.clean <- iris[,-5] iris.dist <- as.matrix(dist(iris.clean)) iris.edge.list <- gen.edge.list(iris.dist) m <- nrow(iris.dist) iris.mst.edge.list <- kruskal(iris.edge.list, m) k <- 3 n.edges <- nrow(iris.mst.edge.list) iris.mst.edge.list <- iris.mst.edge.list[1:(n.edges - (k - 1)),] iris.child.list.mst <- gen.child.list.mst(iris.mst.edge.list, m) iris.clust.mst <- mst.cluster(iris.child.list.mst, m, k)
Initialize UFDS
reset.ufds(m)
reset.ufds(m)
m |
Number of elements. |
A data table containing a 'rank' column and a 'parent' column.
Join the sets the two elements passed as arguments belong to.
union.set(i, j, ufds)
union.set(i, j, ufds)
i |
The first element in the tuple. |
j |
The second element in the tuple. |
ufds |
A data.table representing a UFDS. |
No return value, called for side effects on rank and p.