Title: | Create Dendrograms and Tree Diagrams Using 'ggplot2' |
---|---|
Description: | This is a set of tools for dendrograms and tree plots using 'ggplot2'. The 'ggplot2' philosophy is to clearly separate data from the presentation. Unfortunately the plot method for dendrograms plots directly to a plot device without exposing the data. The 'ggdendro' package resolves this by making available functions that extract the dendrogram plot data. The package provides implementations for 'tree', 'rpart', as well as diana and agnes (from 'cluster') diagrams. |
Authors: | Andrie de Vries [aut, cre], Brian D. Ripley [aut] (author of package tree) |
Maintainer: | Andrie de Vries <[email protected]> |
License: | GPL-2|GPL-3 |
Version: | 0.2.0 |
Built: | 2024-10-27 04:35:16 UTC |
Source: | https://github.com/andrie/ggdendro |
This package enables you to create dendrograms and tree plots using
ggplot2::ggplot()
.
The ggplot2
philosophy is to clearly separate data from the presentation.
Unfortunately the plot method for dendrograms (plot.dendrogram()
) plots
directly to a plot device without exposing the data. The ggdendro
package
resolves this by making available functions that extract the dendrogram plot
data. This data can be used with ggplot
.
The function dendro_data()
extracts data from different objects that
contain dendrogram information. It is a generic function with methods for:
hclust
: dendro_data.hclust()
dendrogram: dendro_data.dendrogram()
regression trees: dendro_data.tree()
partition trees: dendro_data.rpart()
agnes
and diana
: dendro_data.twins()
These methods create an object of class dendro
, consisting of a list of
data frames. To extract the relevant data frames from the list, you can use
the accessor functions:
segment()
: the line segment data
label()
: the text for each end segment
leaf_label()
: the leaf labels of a tree diagram
To plot a dendrogram, either construct a plot with ggplot2::ggplot()
or use
the function ggdendrogram()
.
Andrie de Vries - [email protected]
Method for coercing object to class dendro.
as.dendro(segments, labels, leaf_labels = NULL, class)
as.dendro(segments, labels, leaf_labels = NULL, class)
segments |
data.frame with segment data |
labels |
data.frame with labels data |
leaf_labels |
data.frame with leaf label data |
class |
The class of the original model object, e.g. "hclust". This is
used by |
dendro_data()
and ggdendro-package()
This function provides a generic mechanism to extract relevant plotting data, typically line segments and labels, from a variety of cluster models.
Extract line segment and label data from stats::dendrogram()
or
stats::hclust()
object. The resulting object is a list of data frames
containing line segment data and label data.
dendro_data(model, ...) ## Default S3 method: dendro_data(model, ...) ## S3 method for class 'dendrogram' dendro_data(model, type = c("rectangle", "triangle"), ...) ## S3 method for class 'hclust' dendro_data(model, type = c("rectangle", "triangle"), ...) ## S3 method for class 'twins' dendro_data(model, type = c("rectangle", "triangle"), ...)
dendro_data(model, ...) ## Default S3 method: dendro_data(model, ...) ## S3 method for class 'dendrogram' dendro_data(model, type = c("rectangle", "triangle"), ...) ## S3 method for class 'hclust' dendro_data(model, type = c("rectangle", "triangle"), ...) ## S3 method for class 'twins' dendro_data(model, type = c("rectangle", "triangle"), ...)
model |
object of class "dendrogram", e.g. the output of as.dendrogram() |
... |
ignored |
type |
The type of plot, indicating the shape of the dendrogram. "rectangle" will draw rectangular lines, while "triangle" will draw triangular lines. |
For stats::dendrogram()
and tree::tree()
models, extracts line segment
data and labels.
a list of data frames that contain the data appropriate to each cluster model
A list with components:
segments |
Line segment data |
labels |
Label data |
There are several implementations for specific cluster algorithms:
To extract the data for line segments, labels or leaf labels use:
segment()
: the line segment data
label()
: the text for each end segment
leaf_label()
: the leaf labels of a tree diagram
Other dendro_data methods:
dendro_data.rpart()
,
dendro_data.tree()
,
dendrogram_data()
,
rpart_labels()
Other dendrogram/hclust functions:
dendrogram_data()
require(ggplot2) ### Demonstrate dendro_data.dendrogram model <- hclust(dist(USArrests), "ave") dendro <- as.dendrogram(model) # Rectangular lines ddata <- dendro_data(dendro, type = "rectangle") ggplot(segment(ddata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + coord_flip() + scale_y_reverse(expand = c(0.2, 0)) + theme_dendro() # Triangular lines ddata <- dendro_data(dendro, type = "triangle") ggplot(segment(ddata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + theme_dendro() # Demonstrate dendro_data.hclust require(ggplot2) hc <- hclust(dist(USArrests), "ave") # Rectangular lines hcdata <- dendro_data(hc, type = "rectangle") ggplot(segment(hcdata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + coord_flip() + scale_y_reverse(expand = c(0.2, 0)) + theme_dendro() # Triangular lines hcdata <- dendro_data(hc, type = "triangle") ggplot(segment(hcdata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + theme_dendro() ### Demonstrate the twins of agnes and diana, from package cluster if (require(cluster)) { model <- agnes(votes.repub, metric = "manhattan", stand = TRUE) dg <- as.dendrogram(model) ggdendrogram(dg) } if (require(cluster)) { model <- diana(votes.repub, metric = "manhattan", stand = TRUE) dg <- as.dendrogram(model) ggdendrogram(dg) }
require(ggplot2) ### Demonstrate dendro_data.dendrogram model <- hclust(dist(USArrests), "ave") dendro <- as.dendrogram(model) # Rectangular lines ddata <- dendro_data(dendro, type = "rectangle") ggplot(segment(ddata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + coord_flip() + scale_y_reverse(expand = c(0.2, 0)) + theme_dendro() # Triangular lines ddata <- dendro_data(dendro, type = "triangle") ggplot(segment(ddata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + theme_dendro() # Demonstrate dendro_data.hclust require(ggplot2) hc <- hclust(dist(USArrests), "ave") # Rectangular lines hcdata <- dendro_data(hc, type = "rectangle") ggplot(segment(hcdata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + coord_flip() + scale_y_reverse(expand = c(0.2, 0)) + theme_dendro() # Triangular lines hcdata <- dendro_data(hc, type = "triangle") ggplot(segment(hcdata)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + theme_dendro() ### Demonstrate the twins of agnes and diana, from package cluster if (require(cluster)) { model <- agnes(votes.repub, metric = "manhattan", stand = TRUE) dg <- as.dendrogram(model) ggdendrogram(dg) } if (require(cluster)) { model <- diana(votes.repub, metric = "manhattan", stand = TRUE) dg <- as.dendrogram(model) ggdendrogram(dg) }
Extracts data to plot line segments and labels from a
rpart::rpart()
classification tree object. This data can then be
manipulated or plotted, e.g. using ggplot2::ggplot()
.
## S3 method for class 'rpart' dendro_data( model, uniform = FALSE, branch = 1, compress = FALSE, nspace, minbranch = 0.3, ... )
## S3 method for class 'rpart' dendro_data( model, uniform = FALSE, branch = 1, compress = FALSE, nspace, minbranch = 0.3, ... )
model |
object of class "tree", e.g. the output of tree() |
uniform |
if TRUE, uniform vertical spacing of the nodes is used; this may be less cluttered when fitting a large plot onto a page. The default is to use a non-uniform spacing proportional to the error in the fit. |
branch |
controls the shape of the branches from parent to child node. Any number from 0 to 1 is allowed. A value of 1 gives square shouldered branches, a value of 0 give V shaped branches, with other values being intermediate. |
compress |
if FALSE, the leaf nodes will be at the horizontal plot coordinates of 1:nleaves. If TRUE, the routine attempts a more compact arrangement of the tree. The compaction algorithm assumes uniform=TRUE; surprisingly, the result is usually an improvement even when that is not the case. |
nspace |
the amount of extra space between a node with children and a leaf, as compared to the minimal space between leaves. Applies to compressed trees only. The default is the value of branch. |
minbranch |
set the minimum length for a branch to minbranch times the average branch length. This parameter is ignored if uniform=TRUE. Sometimes a split will give very little improvement, or even (in the classification case) no improvement at all. A tree with branch lengths strictly proportional to improvement leaves no room to squeeze in node labels. |
... |
ignored |
This code is in essence a copy of rpart::plot.rpart()
, retaining
the plot data but without plotting to a plot device.
A list of three data frames:
segments |
a data frame containing the line segment data |
labels |
a data frame containing the label text data |
leaf_labels |
a data frame containing the leaf label text data |
Other dendro_data methods:
dendro_data()
,
dendro_data.tree()
,
dendrogram_data()
,
rpart_labels()
Other rpart functions:
rpart_labels()
,
rpart_segments()
### Demonstrate rpart if (require(rpart)) { require(ggplot2) fit <- rpart(Kyphosis ~ Age + Number + Start, method = "class", data = kyphosis) fitr <- dendro_data(fit) ggplot() + geom_segment(data = fitr$segments, aes(x = x, y = y, xend = xend, yend = yend) ) + geom_text(data = fitr$labels, aes(x = x, y = y, label = label)) + geom_text(data = fitr$leaf_labels, aes(x = x, y = y, label = label)) + theme_dendro() }
### Demonstrate rpart if (require(rpart)) { require(ggplot2) fit <- rpart(Kyphosis ~ Age + Number + Start, method = "class", data = kyphosis) fitr <- dendro_data(fit) ggplot() + geom_segment(data = fitr$segments, aes(x = x, y = y, xend = xend, yend = yend) ) + geom_text(data = fitr$labels, aes(x = x, y = y, label = label)) + geom_text(data = fitr$leaf_labels, aes(x = x, y = y, label = label)) + theme_dendro() }
Extracts data to plot line segments and labels from a tree::tree()
object.
This data can then be manipulated or plotted, e.g. using ggplot2::ggplot()
.
## S3 method for class 'tree' dendro_data(model, type = c("proportional", "uniform"), ...)
## S3 method for class 'tree' dendro_data(model, type = c("proportional", "uniform"), ...)
model |
object of class "tree", e.g. the output of tree() |
type |
Either |
... |
ignored |
A list of three data frames:
segments |
a data frame containing the line segment data |
labels |
a data frame containing the label text data |
leaf_labels |
a data frame containing the leaf label text data |
Andrie de Vries, using code modified from original by Brian Ripley
Other dendro_data methods:
dendro_data()
,
dendro_data.rpart()
,
dendrogram_data()
,
rpart_labels()
Other tree functions:
get_data_tree_leaf_labels()
,
tree_labels()
,
tree_segments()
### Demonstrate tree if (require(tree)) { require(ggplot2) require(MASS) data(cpus, package = "MASS") cpus.ltr <- tree(log10(perf) ~ syct + mmin + mmax + cach + chmin + chmax, data = cpus) tree_data <- dendro_data(cpus.ltr) ggplot(segment(tree_data)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend, linewidth = n), colour = "lightblue" ) + scale_size("n") + geom_text( data = label(tree_data), aes(x = x, y = y, label = label), vjust = -0.5, size = 4 ) + geom_text( data = leaf_label(tree_data), aes(x = x, y = y, label = label), vjust = 0.5, size = 3 ) + theme_dendro() }
### Demonstrate tree if (require(tree)) { require(ggplot2) require(MASS) data(cpus, package = "MASS") cpus.ltr <- tree(log10(perf) ~ syct + mmin + mmax + cach + chmin + chmax, data = cpus) tree_data <- dendro_data(cpus.ltr) ggplot(segment(tree_data)) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend, linewidth = n), colour = "lightblue" ) + scale_size("n") + geom_text( data = label(tree_data), aes(x = x, y = y, label = label), vjust = -0.5, size = 4 ) + geom_text( data = leaf_label(tree_data), aes(x = x, y = y, label = label), vjust = 0.5, size = 3 ) + theme_dendro() }
This is a convenience function
ggdendrogram( data, segments = TRUE, labels = TRUE, leaf_labels = TRUE, rotate = FALSE, theme_dendro = TRUE, ... )
ggdendrogram( data, segments = TRUE, labels = TRUE, leaf_labels = TRUE, rotate = FALSE, theme_dendro = TRUE, ... )
data |
Either a dendro object or an object that can be coerced to class
dendro using the |
segments |
If TRUE, show line segments |
labels |
if TRUE, shows segment labels |
leaf_labels |
if TRUE, shows leaf labels |
rotate |
if TRUE, rotates plot by 90 degrees |
theme_dendro |
if TRUE, applies a blank theme to plot (see
|
... |
other parameters passed to |
A ggplot2::ggplot()
object
### Demonstrate ggdendrogram library(ggplot2) hc <- hclust(dist(USArrests), "ave") # Demonstrate plotting directly from object class hclust p <- ggdendrogram(hc, rotate = FALSE) print(p) ggdendrogram(hc, rotate = TRUE) # demonstrate converting hclust to dendro using dendro_data first hcdata <- dendro_data(hc) ggdendrogram(hcdata, rotate = TRUE, size = 2) + labs(title = "Dendrogram in ggplot2")
### Demonstrate ggdendrogram library(ggplot2) hc <- hclust(dist(USArrests), "ave") # Demonstrate plotting directly from object class hclust p <- ggdendrogram(hc, rotate = FALSE) print(p) ggdendrogram(hc, rotate = TRUE) # demonstrate converting hclust to dendro using dendro_data first hcdata <- dendro_data(hc) ggdendrogram(hcdata, rotate = TRUE, size = 2) + labs(title = "Dendrogram in ggplot2")
Is a dendro? Tests whether an object is of class dendro.
is.dendro(x)
is.dendro(x)
x |
Object to check |
dendro_data()
and ggdendro-package()
segment
extracts line segments, label
extracts labels, and leaf_label
extracts leaf labels from a dendro object.
segment(x) label(x) leaf_label(x)
segment(x) label(x) leaf_label(x)
x |
dendro object |
Sets most of the ggplot
options to blank, by returning blank theme
elements for the panel grid, panel background, axis title, axis text, axis
line and axis ticks.
theme_dendro()
theme_dendro()