Fluid: Data-Linked Visualisations

explorable, self-explanatory research outputs

Overview

With traditional print media, the figures, text and other content are disconnected from the underlying data, making them hard to understand, evaluate and trust. Digital media, such as online papers and articles, present an opportunity to make visual artifacts which are connected to data and able to reveal those fine-grained relationships to an interested user. This would enable research outputs, news articles and other data-driven artifacts to be more transparent, self-explanatory and explorable — a nice goal but one which would be impractical if we had to implement these features by hand for each output. Fluid is a programming language that tracks how data flows through a computation and makes it possible to author computational outputs where various transparency features are built-in.

When the figures below have finished loading, click on the button in the margin to reveal the data pane. Records are shown only if any of their fields are used somewhere in either of the two figures. Records that are completely unused — as well as any unused fields of other records, which are greyed out — are called inert. By hiding inert data, we can present the reader with a view containing only the used data (a useful default setting).

Tracking how data flows through a computation allows us to do more than just hide away or grey out unused data. Try some of the following:

Mouse over the nuclearCap data. Various points in the scatter plot will be highlighted in blue as you move around: these are the outputs which consume the data element under your mouse.
As you move over nuclearCap values, notice that three other values are also highlighted with the blue border. These are its related inputs: the other inputs needed to compute the scatter plot point. Each point in the scatter plot represents a year; for a given year, the nuclear capacities of 4 countries were added together to compute the x-coordinate, which is why in this case we see 4 mutually related values.
Now try the same with nuclearOut. Now the related inputs also include the coal, gas and petrol output for that country. That's because these data are also used to compute the bar segments in the bar chart. You'll see the various bar segments being highlighted with a thin border as you move around in that column.

loading figure(s)

You can also interact with the output. Try the following:

Move your mouse over segments in the bar chart. You'll see the data needed to compute the height of the segment. You'll also see that there is always a scatter plot point highlighted on the right. This secondary highlighting picks out the related outputs: the other outputs that make use of the selected data. This feature is called brushing and linking in data visualisation, but here we can do it in a transparent way. Related input and related outputs are dual notions.
Now click on one of the bar segments. That turns the highlighting on the bar into a persistent selection; the bar will become darker and the corresponding input selection will turn green. Now move your mouse over to the scatter plot. With the first selection still active, you can now interact with different scatter plot points to see how the data they use intersects with the data for the bar segment. In particular any scatter plot points which are “related” to the selected bar segment will demand data which overlaps with the data needed for the bar segment.

Computational transparency as infrastructure

This only scratches the surface of what is possible, but hints at how we might help a reader understand or evaluate a research paper or news article in situ. The key idea is that the transparency features are provided automatically; the author of the content need only express their visualisation as a pure function of the input. As the infrastructure improves, the transparency benefits become available to all users, with no additional effort required on the part of authors. Here is the Fluid code for the figures above:

non-renewables.fld

There are several limitations of the current system, as well as lots of directions in which we plan to move things forward; see the FAQ for details.