Interactive DataMapPlots

As of version 0.2 DataMapPlot also support the creation of interactive data map plots that allow you to zoom in, pan around, and can provide hover tooltips of data specific to each point. The interactive plots are rendered as HTML, making use of Javascript and DeckGL to provide smooth interactivity. This notebook will walk you through the basics of some of what can be achieved with the interactive plotting tools in DataMapPlot. These tools, much like for the static plots, aim to be as simple to use as possible, taking care of most of the difficult aesthetic challenges for you, while also providing enough in the way of options and flexibility to let you generate a plot with your own custom style.

[1]:
import datamapplot

To demonstrate what DataMapPlot can do we’ll need some data. The examples directory of the DataMapPlot repository contains some pre-prepared datasets for experimenting with. We’ll grab one of those. In practice we need a data map – as set of 2d coordinates, one per data sample we are mapping – and a set, or sets, of labels idenityfing the “topic” of a data sample, usually based on clusters in the data map. In this case we’ll use data from the titles and abstracts of papers from the machine learning section of the ArXiv preprint server.

Unlike with the static plots, where we don’t wish to have too many labels, an interactive plot can make use of multiple layers of granularity of labelling, revealing more details labels as you zoom in closer to specific regions of the data map. This means that rather than loading a single set of labels, we’ll load multiple layers of labels with differing cluster resolutions – lower layers will provide very detailed fine grained clusters, while upper layers provide large scale broad clusters.

[2]:
import numpy as np
import requests
import io

base_url = "https://github.com/TutteInstitute/datamapplot"
data_map_file = requests.get(
    f"{base_url}/raw/main/examples/arxiv_ml_data_map.npy"
)
arxivml_data_map = np.load(io.BytesIO(data_map_file.content))
arxivml_label_layers = []
for layer_num in range(5):
    label_file = requests.get(
        f"{base_url}/raw/interactive/examples/arxiv_ml_layer{layer_num}_cluster_labels.npy"
    )
    arxivml_label_layers.append(np.load(io.BytesIO(label_file.content), allow_pickle=True))

Now that we have some data to work with, let’s create a simple interactive plot and display it inline in the notebook. For this we need the create_interactive_plot function. This operates very similarly to create_plot at a basic level: we pass it a data map, and an array of labels, one per data point. In contrast to create_plot however we get an interactive plot that you can zoom and pan around. Note that the interactive plot will avoid having cluster labels overlap, only revealing some cluster labels once sufficiently zoomed in.

[3]:
plot = datamapplot.create_interactive_plot(
    arxivml_data_map,
    arxivml_label_layers[2],
)
plot
[3]:

Since we can zoom in to see individual points, corresponding to different papers on ArXiv, it would be nice to be able to get details associated to each point, such as the title of the paper. This information has also been made available in the DataMapPlot repository, so we will get that data and put it in array for use.

[4]:
hover_data_file = requests.get(
    f"{base_url}/raw/interactive/examples/arxiv_ml_hover_data.npy"
)
arxiv_hover_data = np.load(io.BytesIO(hover_data_file.content), allow_pickle=True)

Furthermore, with that example we used only a single layer of labels. Since we have both more detailed labels, and a higher level overview, let’s make use of that. The interactive_plot function takes an arbitrary number of positional arguments after the data_map that can provide multiple layers of clustering and labelling.

Lastly, let’s mix things up and change the font_family we are using to make things look a little nicer.

[9]:
plot = datamapplot.create_interactive_plot(
    arxivml_data_map,
    arxivml_label_layers[0],
    arxivml_label_layers[2],
    arxivml_label_layers[4],
    hover_text = arxiv_hover_data,
    font_family="Playfair Display SC",
)
plot
[9]:

Now when initially zoomed out we see only the top high level labels, but as we zoom in we can reveal further and further detail, getting to quite specific sub-topics in the firld of machine learning. Moreover, since we added the hover data, mousing over the points will bring up a tooltip providing the title of the individial papers. This makes for a rich way to explore the landscape of machine learning.

With many layers of clusters layered on top of one another it can be a little harder to discern the clusters. To help with this we can draw in cluster boundaries.

[6]:
plot = datamapplot.create_interactive_plot(
    arxivml_data_map,
    arxivml_label_layers[0],
    arxivml_label_layers[2],
    arxivml_label_layers[4],
    hover_text = arxiv_hover_data,
    font_family="Playfair Display SC",
    cluster_boundary_polygons=True,
    cluster_boundary_line_width=6,
)
plot
[6]:

What more can we do? The create_interactive_plot takes many of the same options as create_plot, allowing you to add titles, sub-titles and logos to the interactive output. It also supports a wide variety of extra options specific to the interactive plot, such as the ability to add Javascript actions to be taken when clicking on points. In this case we’ll have the on-click action open a window with a google search for the title of the paper that was clicked on. It’s also possible to have an intractive text search function applied to the hover_text.

[7]:
plot = datamapplot.create_interactive_plot(
    arxivml_data_map,
    arxivml_label_layers[0],
    arxivml_label_layers[2],
    arxivml_label_layers[4],
    hover_text = arxiv_hover_data,
    font_family="Playfair Display SC",
    title="ArXiv Machine Learning Landscape",
    sub_title="A data map of papers from the Machine Learning section of ArXiv",
    logo="https://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/ArXiv_logo_2022.svg/512px-ArXiv_logo_2022.svg.png",
    logo_width=180,
    on_click="window.open(`http://google.com/search?q=\"{hover_text}\"`)",
    enable_search=True,
    darkmode=True,
)
plot
[7]:

While the interactive plots work well in-line in a notebook, it is often useful to be able to send plots to others. The save method allows you to save the plot to a single HTML file that will embed (compressed) copies of the data. You can then share the HTML as required.

[8]:
plot.save("ArXiv_data_map_example.html")

The plot objects also have a string representation (calling str on the object) which is simply the raw HTML, so you can also gain access to, and manipulate, the final results as needed.

This provides a very quick introduction to some of the capabilities of the interactive_plot function. Further notebooks will provide more details on the various options available and how best to make use of them.