Basic Static Plotting

datamapplot.create_plot(*args, **kwargs)

Create a static plot from data_map_coords with text labels provided by labels. This is the primary function for DataMapPlot and provides the easiest interface to the static plotting functionality. This function provides a number of options, but also passes any further keyword options through to the lower level render_plot function so be sure to check the documentation for render_plot to discover further keyword arguments that can be used here as well.

Parameters:
data_map_coords: ndarray of floats of shape (n_samples, 2)

The 2D coordinates for the data map. Usually this is produced via a dimension reduction technique such as UMAP, t-SNE, PacMAP, PyMDE etc.

labels: ndarray of strings (object) of shape (n_samples,)

A string label each data point in the data map. There should ideally by only up to 64 unique labels. Noise or unlabelled points should have the same label as noise_label, which is “Unlabelled” by default.

title: str or None (optional, default=None)

A title for the plot. If None then no title is used for the plot. The title should be succint; three to seven words.

sub_title: str or None (optional, default=None)

A sub-title for the plot. If None then no sub-title is used for the plot. The sub-title can be significantly longer then the title and provide more information about the plot and data sources.

noise_label: str (optional, default=”Unlabelled”)

The string used in the labels array to identify the unlabelled or noise points in the dataset.

noise_color: str (optional, default=”#999999”)

The colour to use for unlabelled or noise points in the data map. This should usually be a muted or neutral colour to distinguish background points from the labelled clusters.

color_label_text: str or bool (optional, default=True)

Whether to use colours for the text labels generated in the plot. If False then the text labels will default to either black or white depending on darkmode. If a string is provided it should be a valid matplotlib colour specification and all text labels will be this colour.

color_label_arrows: str or bool (optional, default=True)

Whether to use colours for the arrows between the text labels and clusters. If False then the arrows will default to either black or white depending on darkmode. If a string is provided it should eb a valid matplotlib colour specification and all arrows will be this colour.

label_wrap_width: int (optional, default=16)

The number of characters to apply text-wrapping at when creating text labels for display in the plot. Note that long words will not be broken, so you can choose relatively small values if you want tight text-wrapping.

label_color_map: dict or None (optional, default=None)

A colour mapping to use to colour points/clusters in the data map. The mapping should be keyed by the unique cluster labels in labels and take values that are hex-string representations of colours. If None then a colour mapping will be auto-generated.

figsize: (int, int) (optional, default=(12,12))

How big to make the figure in inches (actual pixel size will depend on dpi).

dynamic_label_size: bool (optional, default=False)

Whether to dynamically resize the text labels based on the relative sizes of the clusters. This can be useful to help highlight larger clusters.

dpi: int (optional, default=plt.rcParams[“figure.dpi”])

The dots-per-inch setting usd when rendering the plot.

force_matplotlib: bool (optional, default=False)

Force using matplotlib instead of datashader for rendering the scatterplot of the data map. This can be useful if you wish to have a different marker_type, or variably sized markers based on a marker_size_array, neither of which are supported by the datashader based renderer.

darkmode: bool (optional, default=False)

Whether to render the plot in darkmode (with a dark background) or not.

highlight_labels: list of str or None (optional, default=None)

A list of unique labels that should have their text highlighted in the resulting plot. Arguments supported by render_plot can allow for control over how highlighted labels are rendered. By default they are simply rendered in bold text.

palette_hue_shift: float (optional, default=0.0)

A setting, in degrees clockwise, to shift the hue channel when generating a colour palette and color_mapping for the labels.

palette_hue_radius_dependence: float (optional, default=1.0)

A setting that determines how dependent on the radius the hue channel is. Larger values will result in more hue variation where there are more outlying points.

palette_theta_range: float (optional, default=np.pi/16)

A setting that determines how restrictive the radius mask used will be. Larger values will result in a less restrictive mask.

use_medoids: bool (optional, default=False)

Whether to use medoids instead of centroids to determine the “location” of the cluster, both for the label indicator line, and for palette colouring. Note that medoids are more computationally expensive, especially for large plots, so use with some caution.

cmap: matplotlib cmap or None (optional, default=None)

A linear matplotlib cmap colour map to use as the base for a generated colour mapping. This should be a matplotlib cmap that is smooth and linear, and cyclic (see the colorcet package for some good options). If not a cyclic cmap it will be “made” cyclic by reflecting it. If None then a custom method will be used instead.

cvd_safer: bool (optional, default=False)

Whether to use a colour palette that is safer for colour vision deficiency (CVD). This will override any provided cmap and use a CVD safer palette instead.

marker_color_array: np.ndarray or None (optional, default=None)

An array of colours for each of the points in the data map scatterplot. If provided this will override any colouring provided by the labels array.

**render_plot_kwds

All other keyword arguments are passed through the render_plot which provides significant further control over the aesthetics of the plot.

Returns:
fig: matplotlib.Figure

The figure that the resulting plot is rendered to.

ax: matpolotlib.Axes

The axes contained within the figure that the plot is rendered to.

Advanced Static Plotting Options

datamapplot.render_plot(*args, **kwargs)

Render a static data map plot with given colours and label locations and text. This is a lower level function, and should usually not be used directly unless there are specific reasons for digging in. This usually involves things like getting direct control over label locations, altering label texts to suit specific needs, or direct control over point colouring in the scatterplot.

All keyword arguments from create_plot are passed on to render_plot, so any keyword arguments here are also valid keyword arguments for create_plot.

Parameters:
data_map_coords: ndarray of floats of shape (n_samples, 2)

The 2D coordinates for the data map. Usually this is produced via a dimension reduction technique such as UMAP, t-SNE, PacMAP, PyMDE etc.

color_list: iterable of str of len n_samples

A list of hex-string colours, one per sample, for colouring points in the scatterplot of the data map.

label_text: list of str

A list of label text strings, one per unique label.

label_locations: ndarray of floats of shape (n_labels, 2)

An array of the “location” (usually centroid) of the cluster of the associated text label (see label_text).

title: str or None (optional, default=None)

A title for the plot. If None then no title is used for the plot. The title should be succint; three to seven words.

sub_title: str or None (optional, default=None)

A sub-title for the plot. If None then no sub-title is used for the plot. The sub-title can be significantly longer then the title and provide more information about the plot and data sources.

figsize: (int, int) (optional, default=(12,12))

How big to make the figure in inches (actual pixel size will depend on dpi).

dynamic_label_size: bool (optional, default=False)

Whether to use dynamic label sizing based on the sizes of the clusters.

dynamic_label_size_scaling_factor: float (optional, default=0.75)

The scaling factor to use when using dynamic label sizing based on the sizes of the clusters.

font_family: str (optional, default=”DejaVu Sans”)

The font_family to use for the plot – the labels and the title and sub-title unless explicitly over-ridden by title_keywords or sub_title_keywords.

label_linespacing: float (optional, default=0.95)

The line-spacing to use when rendering multi-line labels in the plot. The default of 0.95 keeps multi-line labels compact, but can be less than ideal for some fonts.

label_font_size: float or None (optional, default=None)

The font-size (in pts) to use for the text labels in the plot. If this is None then a heuristic will be used to try to find the best font size that can fit all the labels in.

label_text_colors: str or list of str or None (optional, default=None)

The colours of the text labels, one per text label. If None then the text labels will be either black or white depending on darkmode. If just a single string then it is assumed to be a fixed colour for all labels.

label_arrow_colors: str or list of str or None (optional, default=None)

The colours of the arrows between the text labels and clusters, one per text label. If None then the arrows will be either black or white depending on darkmode. If just a single string then it is assumed to be a fixed colour for all arrows.

highlight_colors: list of str or None (optional default=None)

The colours used if text labels are highlighted and a bounding box around the label is used. For example create_plot uses the cluster colours from the colour mapping that was passed or created.

point_size: int or float (optional, default=1)

How big to make points in the scatterplot rendering of the data map. Depending on whether you are in datashader mode or matplotlib mode this can either be an int (datashader) or a float (matplotlib). If in datashader mode this is explicitly the radius, in number of pixels, that each point should be. If in matplotlib mode this is the matplotlib scatterplot size, which can be relative to the plot-size

and other factors.

alpha: float (optional, default=1.0)

The alpha transparency value to use when rendering points.

dpi: int (optional, default=plt.rcParams[“figure.dpi”])

The dots-per-inch to use when rendering the plot.

label_over_points: bool (optional, default=False)

Whether to attempt tom place text labels directly on top of the points in clusters. This can result in severe over-packing, and this is remedied via pylabeladjust which can end up moving labels some distance. For smaller numbers of labels this is likely a good choice, for more than 20 labels this will require a small font. For larger numbers of labels still this may be sub-optimal.

label_base_radius: float or None (optional, default=None)

Labels are placed in rings around the data map. This value can explicitly control the radius (in data coordinates) of the innermost such ring.

label_margin_factor: float (optional, default=1.5)

The expansion factor to use when creating a bounding box around the label text to compute whether overlaps are occurring during the label placement adjustment phase.

min_font_size: float (optional, default=4.0)

The minimum font size to use when estimating the font size for the labels.

max_font_size: float (optional, default=24.0)

The maximum font size to use when estimating the font size for the labels.

min_font_weight: int (optional, default=200)

The minimum font weight to use when using dynamic label sizing (font weights will vary as well).

max_font_weight: int (optional, default=800)

The maximum font weight to use when using dynamic label sizing (font weights will vary as well).

highlight_labels: list of str or None (optional, default=None)

A list of the labels to be highlighted.

highlight_label_keywords: dict (optional, default={“fontweight”: “bold”})

Keywords for how to highlight the labels. This dict will be passed on as keyword arguments to the matplotlib annotate function. See the matplotlib documentation for more details on what can be done.

add_glow: bool (optional, default=True)

Whether to add a glow-effect using KDEs.

noise_color: str (optional, default=”#999999”)

The colour to use for unlabelled or noise points in the data map. This should usually be a muted or neutral colour to distinguish background points from the labelled clusters.

glow_keywords: dict (optional, default={“kernel”: “gaussian”,”kernel_bandwidth”: 0.25})

Keyword arguments that will be passed along to the add_glow_to_scatterplot function. See that function for more details.

darkmode: bool (optional, default=False)

Whether to render the plot in darkmode (with a dark background) or not.

logo: ndarray or None (optional, default=None)

A numpy array representation of an image (suitable for matplotlib’s imshow) to be used as a logo placed in the bottom right corner of the plot.

logo_width: float (optional, default=0.15)

The width, as a fraction of the total figure width, of the logo.

force_matplotlib: bool (optional, default=False)

Force using matplotlib instead of datashader for rendering the scatterplot of the data map. This can be useful if you wish to have a different marker_type, or variably sized markers based on a marker_size_array, neither of which are supported by the datashader based renderer.

label_direction_bias: float or None (optional, default=None)

When placing labels in rings, how much bias to place toward east-west compass points as opposed to north-south. A value of 1.0 provides no bias (uniform placement around the circle). Values larger than one will place more labels ion the east-west areas.

marker_type: str (optional, default=”o”)

The type of marker to use for rendering the scatterplot. This is only valid if matplotlib mode is being used. Valid marker_types are any matplotlib marker string. See the matplotlib marker documentation for more details.

marker_size_array: ndarray of shape (n_samples,) or None (optional, default=None)

The (variable) size or markers to use. This is only valid if matplotlib mode is being used. This should be an array of (matplotlib) marker sizes as you would use for the s argument in matplotlib.pyplot.scatterplot.

arrowprops: dict (optional default={})

A dict of keyword argumetns to pass through to the arrowprops argument of matplotlib.pyplot.annotate. This allows for control of arrow-styles, connection-styles, linewidths, colours etc. See the documentation of matplotlib’s annotate function for more details.

title_keywords: dict or None (optional, default=None)

A dictionary of keyword arguments to pass through to matplotlib’s suptitle fucntion. This includes things like fontfamily, fontsize, fontweight, color, etc.

sub_title_keywords: dict or None (optional, default=None)

A dictionary of keyword arguments to pass through to matplotlib’s title fucntion. This includes things like fontfamily, fontsize, fontweight, color, etc.

pylabeladjust_speed: None or float (optional, default=None)

pylabeladjust speed for adjusting label positioning when doing labels over points. If label_over_points is False then this will have no effect. If None then a good choice of speed will be approximated from the data.

pylabeladjust_max_iterations: int (optional, default=500)

The maximum number of pylabeladjust iterations for adjusting label positioning when doing labels over points. If label_over_points is False then this will have no effect.

pylabeladjust_adjust_by_size: bool (optional, default=True)

Whether to adjust the labels based on the size of the rectangles for adjusting label positioning when doing labels over points. If label_over_points is False then this will have no effect.

pylabeladjust_margin_percentage: float (optional, default=7.5)

The margin percentage for the repulsion radius for adjusting label positioning when doing labels over points. If label_over_points is False then this will have no effect.

pylabeladjust_radius_scale: float (optional, default=1.05)

The scale factor for the repulsion radius for adjusting label positioning when doing labels over points. If label_over_points is False then this will have no effect.

label_font_stroke_width: float (optional, default=3)

The width of the stroke to use when rendering the font. This is used to create an outline that distinguishes the text from the background. Larger values will make text more visible against the background at some loss of font legibility. You may need to change this value when rendering at particularly high resolutions.

label_font_outline_alpha: float (optional, default=0.5)

The alpha value to use when rendering the font outline. This is used to create an outline that distinguishes the text from the background. Larger values will make text more visible against the background at some loss of font legibility.

verbose: bool (optional, default=False)

Print progress as the plot is being created.

ax: None or matplotlib.axes (optional, default=None)

If not None, render the plot to this axis, otherwise create a new figure and axis.

Returns:
fig: matplotlib.Figure

The figure that the resulting plot is rendered to.

ax: matpolotlib.Axes

The axes contained within the figure that the plot is rendered to.