Basic Interactive Plotting

datamapplot.create_interactive_plot(*args, **kwargs)
Parameters:
data_map_coords: ndarray of floats of shape (n_samples, 2)

The 2D coordinates for the data map. Usually this is produced via a dimension reduction technique such as UMAP, t-SNE, PacMAP, PyMDE etc.

*label_layers: np.ndarray

All remaining positional arguments are assumed to be labels, each at a different level of resolution. Ideally these should be ordered such that the most fine-grained resolution is first, and the coarsest resolution is last. The individual labels-layers should be formatted the same as for create_plot.

hover_text: list or np.ndarray or None (optional, default=None)

An iterable (usually a list of numpy array) of text strings, one for each data point in data_map_coords that can be used in a tooltip when hovering over points.

inline_data: bool (optional, default=True)

Whether to include data inline in the HTML file (compressed and base64 encoded) of whether to write data to separate files that will then be referenced by the HTML file – in the latter case you will need to ensure all the files are co-located and served over an http server or similar. Inline is the best default choice for easy portability and simplicity, but can result in very large file sizes.

noise_label: str (optional, default=”Unlabelled”)

The string used in the labels array to identify the unlabelled or noise points in the dataset.

noise_color: str (optional, default=”#999999”)

The colour to use for unlabelled or noise points in the data map. This should usually be a muted or neutral colour to distinguish background points from the labelled clusters.

color_label_text: bool (optional, default=True)

Whether to use colours for the text labels generated in the plot. If False then the text labels will default to either black or white depending on darkmode.

label_wrap_width: int (optional, default=16)

The number of characters to apply text-wrapping at when creating text labels for display in the plot. Note that long words will not be broken, so you can choose relatively small values if you want tight text-wrapping.

label_color_map: dict or None (optional, default=None)

A colour mapping to use to colour points/clusters in the data map. The mapping should be keyed by the unique cluster labels in labels and take values that are hex-string representations of colours. If None then a colour mapping will be auto-generated.

width: int or str (optional, default=”100%”)

The width of the plot when rendered in a notebook. This should be a valid HTML iframe width specification – either an integer number of pixels, or a string that can be properly interpreted in HTML.

height: int or str (optional, default=800)

The height of the plot when rendered in a notebook. This should be a valid HTML iframe height specification – either an integer number of pixels, or a string that can be properly interpreted in HTML.

darkmode: bool (optional, default=False)

Whether to render the plot in darkmode (with a dark background) or not.

palette_hue_shift: float (optional, default=0.0)

A setting, in degrees clockwise, to shift the hue channel when generating a colour palette and color_mapping for the labels.

palette_hue_radius_dependence: float (optional, default=1.0)

A setting that determines how dependent on the radius the hue channel is. Larger values will result in more hue variation where there are more outlying points.

palette_theta_range: float (optional, default=np.pi/16)

A setting that determines how restrictive the radius mask used will be. Larger values will result in a less restrictive mask.

cmap: matplotlib cmap or None (optional, default=None)

A linear matplotlib cmap colour map to use as the base for a generated colour mapping. This should be a matplotlib cmap that is smooth and linear, and cyclic (see the colorcet package for some good options). If not a cyclic cmap it will be “made” cyclic by reflecting it. If None then a custom method will be used instead.

marker_size_array: np.ndarray or None (optional, default=None)

An array of sizes for each of the points in the data map scatterplot.

marker_alpha_array: np.ndarray or None (optional, default=None)

An array of alpha values for each of the points in the data map scatterplot.

use_medoids: bool (optional, default=False)

Whether to use medoids instead of centroids to determine the “location” of the cluster, both for the label indicator line, and for palette colouring. Note that medoids are more computationally expensive, especially for large plots, so use with some caution.

cluster_boundary_polygons: bool (optional, default=False)

Whether to draw alpha-shape generated boundary lines around clusters. This can be useful in highlighting clusters at different resolutions when using many different label_layers.

polygon_alpha: float (optional, default=0.1)

The alpha value to use when genrating alpha-shape based boundaries around clusters.

cvd_safer: bool (optional, default=False)

Whether to use a colour palette that is safer for colour vision deficiency (CVD). This will override any provided cmap and use a CVD safer palette instead.

jupyterhub_api_token: str or None (optional, default=None)

The JupyterHub API token to use when rendering the plot inline in a notebook via jupyterhub. This should not be necessary for most users, but can be useful in some environments where the default token is not available.

enable_table_of_contents: bool (optional, default=False)

Whether to build and display a table of contents with the label heirarchy.

**render_html_kwds:

All other keyword arguments will be passed through the render_html function. Please see the docstring of that function for further options that can control the aesthetic results.

Returns:

Advanced Interactive Plotting Options

datamapplot.render_html(*args, **kwargs)

Given data about points, and data about labels, render to an HTML file using Deck.GL to provide an interactive plot that can be zoomed, panned and explored.

Parameters:
point_dataframe: pandas.DataFrame

A Dataframe containing point information for rendering. At a minimum this should include columns “x”, “y”, “r”, “g”, “b” and “a” that provide the x,y position and r,g,b color for each point. Note that r,g,b,a values should be uint8 values.

label_dataframe: pandas.DataFrame

A Dataframe containing information about labels, and optionally bounding polygons, of clusters. At a minimum this should include columns “x”, “y”, “r”, “g”, “b” and “a” that provide the x,y position and r,g,b colour of the label.

inline_data: bool (optional, default=True)

Whether to include data inline in the HTML file (compressed and base64 encoded) of whether to write data to separate files that will then be referenced by the HTML file – in the latter case you will need to ensure all the files are co-located and served over an http server or similar. Inline is the best default choice for easy portability and simplicity, but can result in very large file sizes.

title: str or None (optional, default=None)

A title for the plot, to be placed in the top left corner. The title should be brief and to the point. More detail can be provided in the sub_title if required.

sub_title: str or None (optional, default=None)

A sub_title for the plot, to be placed in the top left corner.

title_font_size: int (optional, default=36)

The font-size of the title in points.

sub_title_font_size: int (optional, default=18)

The font-size of the sub-title in points.

text_collision_size_scale: float (optional, default=3.0)

How to scale text labels for the purpose of collision detection to determine which labels to display.

text_min_pixel_size: float (optional, default=12.0)

The minimum pixel size of label text. If text would be smaller than this in size then render the text to be at least this size.

text_max_pixel_size: float (optional, default=36.0)

The maximum pixel size of label text. If text would be larger than this in size then render the text to be at most this size.

font_family: str (optional, default=”Roboto”)

The font family to use for label text and titles. If the font family is a google font then the required google font api handling will automatically make the font available, so any google font family is acceptable.

font_weight: str or int (optional, default=600)

The font weight to use for the text labels within the plot. Either weight specification such as “thin”, “normal”, or “bold” or an integer value between 0 (ultra-thin) and 1000 (ultra-black).

tooltip_font_family: str (optional default=”Roboto”)

The font family to use in tooltips/hover text. If the font family is a google font then the required google font api handling will automatically make the font available, so any google font family is acceptable.

tooltip_font_weight: str or int (optional, default=400)

The font weight to use for the tooltip /hover text within the plot. Either weight specification such as “thin”, “normal”, or “bold” or an integer value between 0 (ultra-thin) and 1000 (ultra-black).

logo: str or None (optional, default=None)

A logo image to include in the bottom right corner of the map. This should be a URL to the image.

logo_width: int (optional, default=256)

The width, in pixels, of the logo to be included in the bottom right corner. The logo will retain it’s aspect ratio, so choose the width accordingly.

color_label_text: bool (optional, default=True)

Whether the text labels for clusters should be coloured or not. If set to False the labels will be either black or white depending on whether darkmode is set.

line_spacing: float (optional, default=0.95)

Line height spacing in label text.

min_fontsize: float (optional, default=12)

The minimum font size (in points) of label text. In general label text is scaled based on the size of the cluster the label if for; this will set the minimum value for that scaling.

max_fontsize: float (optional, default=24)

The maximum font size (in points) of label text. In general label text is scaled based on the size of the cluster the label if for; this will set the maximum value for that scaling.

text_outline_width: float (optional, default=8)

The size of the outline around the label text. The outline, in a contrasting colour, can make text more readable against the map background. Choosing larger sizes can help if text is less legible.

text_outline_color: str (optional, default=”#eeeeeedd”)

The colour of the outline around the label text. The outline should be a contrasting colour to the colour of the label text. By default this is white when darkmode is False and black when darkmode is True.

point_size_scale: float or None (optional, default=None)

The size scale of points. If None the size scale will be determined from the data.

point_hover_color: str (optional, default=”#aa0000bb”)

The colour of the highlighted point a user is hovering over.

point_radius_min_pixels: float (optional, default=0.01)

The minimum number of pixels in radius of the points in the map; if zoomed out enough that a point would be smaller than this, it is instead rendered at this radius. This allows points to remain visible when zoomed out.

point_radius_max_pixels: float (optional, default=24)

The maximum number of pixels in radius of the points in the map; if zoomed in enough that a point would be larger than this, it is instead rendered at this radius. This allows zooming in to differentiate points that are otherwise overtop of one

another.

point_line_width_min_pixels: float (optional, default=0.001)

The minimum pixel width of the outline around points.

point_line_width_max_pixels: float (optional, default=3)

The maximum pixel width of the outline around points.

point_line_width: float (optional, default=0.001)

The absolute line-width in common coordinates of the outline around points.

cluster_boundary_line_width: float (optional, default=1.0)

The linewidth to use for cluster boundaries. Note that cluster boundaries scale with respect to cluster size, so this is a scaling factor applied over this.

initial_zoom_fraction: float (optional, default=1.0)

The fraction of of data that should be visible in the initial zoom lavel state. Sometimes data maps can have extreme outliers, and lowering this value to prune those out can result in a more useful initial view.

background_color: str or None (optional, default=None)

A background colour (as a hex-string) for the data map. If None a background colour will be chosen automatically based on whether darkmode is set.

background_image: str or None (optional, default=None)

A background image to use for the data map. If None no background image will be used. The image should be a URL to the image.

background_image_bounds: list or None (optional, default=None)

The bounds of the background image. If None the image will be scaled to fit the data map. If a list of four values is provided then the image will be scaled to fit within those bounds.

darkmode: bool (optional, default=False)

Whether to use darkmode.

offline_data_prefix: str or None (optional, default=None)

If inline_data=False a number of data files will be created storing data for the plot and referenced by the HTML file produced. If not none then this will provide a prefix on the filename of all the files created.

tooltip_css: str or None (optional, default=None)

Custom CSS used to fine the properties of the tooltip. If None a default CSS style will be used. This should simply be the required CSS directives specific to the tooltip.

hover_text_html_template: str or None (optional, default=None)

An html template allowing fine grained control of what is displayed in the hover tooltip. This should be HTML with placeholders of the form {hover_text} for the supplied hover text and {column_name} for columns from extra_point_data (see below).

extra_point_data: pandas.DataFrame or None (optional, default=None)

A dataframe of extra information about points. This should be a dataframe with one row per point. The information in this dataframe can be referenced by column-name by either hover_text_html_template or on_click for use in tooltips or on-click actions.

enable_search: bool (optional, default=False)

Whether to enable a text search that can highlight points with hover_text that include the given search string.

search_field: str (optional, default=”hover_text”)

If enable_search is True and extra_point_data is not None, then search this column of the extra_point_data dataframe, or use hover_text if set to "hover_text".

histogram_data: list, pandas.Series, or None (optional, default=None)

The data used to generate a histogram. The histogram data can be passed as a list or Pandas Series; if None, the histogram is disabled. The length of the list or Series must match the number of rows in point_dataframe. The values within the list or Series must be of type unsigned integer, signed integer, floating-point number, string, or a date string in the format YYYY-MM-DD.

histogram_n_bins: int (optional, default=20)

The number of bins in the histogram. It is the maximum number of bins if binning categorical data. If the number of unique values in the data is less than or equal to histogram_n_bins, the number of bins will be the number of unique values.

histogram_group_datetime_by: str or None (optional, default=None)

The time unit to group the datetime data by. If None, the datetime data will not be grouped. The time unit can be one of the following: year, quarter, month, week, day, hour, minute, or second.

histogram_range: tuple or None (optional, default=None)

The range of the histogram. If None, the range is automatically determined from the histogram data. If a tuple, it should contain two values representing the minimum and maximum values of the histogram.

histogram_settings: dict or None (optional, default={})

A dictionary containing custom settings for the histogram, if enabled. If histogram_data is provided, this dictionary allows you to customize the appearance of the histogram. The dictionary can include the following keys:

  • “histogram_width”: str

    The width of the histogram in pixels.

  • “histogram_height”: str

    The height of the histogram in pixels.

  • “histogram_bin_count”: int

    The number of bins in the histogram.

  • “histogram_title”: str

    The title of the histogram.

  • “histogram_bin_fill_color”: str

    The fill HEX color of the histogram bins (e.g. #6290C3).

  • “histogram_bin_selected_fill_color”: str

    The fill HEX color of the selected histogram bins (e.g. #2EBFA5).

  • “histogram_bin_unselected_fill_color”: str

    The fill HEX color of the unselected histogram bins (e.g. #9E9E9E).

  • “histogram_bin_context_fill_color”: str

    The fill HEX color of the contextual bins in the histogram (e.g. #E6E6E6).

  • “histogram_log_scale”: bool

    Whether to use a log scale for y-axis of the histogram.

on_click: str or None (optional, default=None)

A javascript action to be taken if a point in the data map is clicked. The javascript can reference {hover_text} or columns from extra_point_data. For example one could provide "window.open(`http://google.com/search?q="{hover_text}"`)" to open a new window with a google search for the hover_text of the clicked point.

selection_handler: instance of datamapplot.selection_handlers.SelectionHandlerBase or None (optional, default=None)

A selection handler to be used to handle selections in the data map. If None, the interactive selection will not be enabled. If a selection handler is provided, the selection handler will be used to determine how to react to selections made on the data map. Selection handlers can be found in the datamapplot.selection_handlers module, or custom selection handlers can be created by subclassing the SelectionHandlerBase class.

colormaps: dict or None (optional, default=None)

A dictionary containing information about the colormaps to use for the data map. The dictionary should bey keyed by a descriptive name for the field, and the value should be an array of values to use for colouring the field. Datamapplot will try to infer data-types and suitable colormaps for the fields. If you need more control you should instead use colormap_rawdata and colormap_metadata which allow you to specify more detailed information about the colormaps to use.

colormap_rawdata: list of numpy.ndarray or None (optional, default=None)

A list of numpy arrays containing the raw data to be used for the colormap. Each array should be the same length as the number of points in the data map. If None, the colormap will not be enabled.

colormap_metadata: list of dict or None (optional, default=None)

A list of dictionaries containing metadata about the colormap. Each dictionary should contain the following keys: “field” (str), “description” (str), and “cmap” (str). If None, the colormap will not be enabled. The field should a short (one word) name for the metadata field, the description should be a longer description of the field, and the cmap should be the name of the colormap to use, and must be available in matplotlib colormap registry.

cluster_layer_colormaps: bool (optional, default=False)

Whether to use per-layer cluster colormaps. If True, a separate colormap in the colormaps dropdown will be created for each layer of the label data. This is useful when the label data is split into multiple layers, and you would like users to be able to select individual clustering resolutions to colour by.

enable_table_of_contents: bool (optional, default=False)

Whether to enable a table of contents that highlights label heirarchy and aids navigation in the datamap.

table_of_contents_kwds: dict (optional, default={“title”:”Topic Tree”, “font_size”:”12pt”, “max_width”:”30vw”, “max_height”:”42vh”, “color_bullets”:False, “button_on_click”:None, “button_icon”:”&#128194”})

A dictionary containing custom settings for the table of contents. The dictionary can include the following keys:

  • “title”: str

    The title of the table of contents.

  • “font_size”: str

    The font size of the table of contents.

  • “max_width”: str

    The max width of the table of contents.

  • “max_height”: str

    The max height of the table of contents.

  • “color_bullets”: bool

    Whether to use cluster colors for the bullets.

  • “button_on_click”: str or None

    An optional javascript action to be taken if a button in the table of contents is selected. If None, there will be no buttons, otherwise they will be added with the “button_icon” setting. Each button will be related to a label, and can access the points related to that label. This javascript can reference {hover_text} or columns from extra_point_data, at which point an array is built with those values for each point that the label describes. For example one could provide "console.log({hover_text}" to log the hover_text of all points related to the label.

  • “button_icon”: str

    The text to appear on the table of contents buttons. These buttons do not appear unless “button_on_click” is defined.

custom_css: str or None (optional, default=None)

A string of custom CSS code to be added to the style header of the output HTML. This can be used to provide custom styling of other features of the output HTML as required.

custom_html: str or None (optional, default=None)

A string of custom HTML to be added to the body of the output HTML. This can be used to add other custom elements to the interactive plot, including elements that can be interacted with via the on_click action for example.

custom_js: str or None (optional, default=None)

A string of custom Javascript code that is to be added after the code for rendering the scatterplot. This can include code to interact with the plot which is stored as deckgl.

minify_deps: bool (optional, default=True)

Whether to minify the JavaScript and CSS dependency files before embedding in the HTML template.

cdn_url: str (optional, default=”unpkg.com”)

The URL of the CDN to use for fetching JavaScript dependencies.

offline_mode: bool (optional, default=False)

Whether to use offline mode for embedding data and fonts in the HTML template. If True, the data and font files will be embedded in the HTML template as base64 encoded strings.

offline_mode_js_data_file: str or None (optional, default=None)

The name of the JavaScript data file to be embedded in the HTML template in offline mode. If None a default location used by dmp_offline_cache will be used, and if the file doesn’t exist it will be created.

offline_mode_font_data_file: str or None (optional, default=None)

The name of the font data file to be embedded in the HTML template in offline mode. If None a default location used by dmp_offline_cache will be used, and if the file doesn’t exist it will be created.

cluster_colormap: list of str or None (optional, default=None)

The colormap to use for cluster colors; if None we try to infer this from point data.

splash_warning: str or None (optional, default=None)

A warning message to be displayed in a splash screen when the plot is first loaded. This can be used to used to warn users about the volume of data, or the nature of the data, or to provide other information that might be useful to the user. This will only be active for inline_data=False and will be displayed before data is loaded, and data loading will not proceed until the user has dismissed the warning.

Returns:
interactive_plot: InteractiveFigure

An interactive figure with hover, pan, and zoom. This will display natively in a notebook, and can be saved to an HTML file via the save method.