CAREER: Enabling Reproducibility of Interactive Visual Data Analysis

Abstract

Reproducibility and justifiability are widely recognized as critical aspects of data-driven decision making in fields as varied as scientific research, business, healthcare, or intelligence analysis. This project is concerned with enabling reproducibility and justifiability of decisions in the data analysis process, specifically as it relates to visual data analysis. Visualization is an important tool for discovery, yet decisions made by humans based on visualizations of data are difficult to capture and to justify. This project will develop methods to justify, communicate, and audit decisions made based on visual analysis. This, in turn will lead to better outcomes, achieved with less effort and cost. The increasing use of visual analysis tools for decision making will make data analysis accessible to a broad variety of people, as visual analysis tools are generally easier to use than scripting languages and do not require extensive computational and statistical training. This research and its related activities increase accessibility and enhance the data analysis infrastructure for research and education.

To achieve these goals, this research will develop a framework for making visual analysis sessions not only reproducible but also reusable. The approach is based on tracking semantically meaningful provenance data during an interactive visual analysis session. Once a discovery is made, analysts can use this history to curate a succinct analysis story, adding justifications and explanations to make their analysis reproducible by others. Using a semi-automatic process, analysts will be able to make their actions data-aware, so that their analysis processes become robust to changes, such as updates in the data. A second contribution of the proposed work is the integration of visual analysis into computational analysis processes. While visualization is commonly used to present computational analysis results, the results of a visual analysis session are rarely used to feed into further computational processes. The techniques developed in this project will allow analysts to feed analysis results (selections, aggregations, filters, etc.) back into a computational environment. This will make it possible to use interactive visualization at any point in the data analysis process while maintaining reproducibility and enabling reuse. The expected results include new methods to capture user intent, create data stories from analysis processes, and to integrate computational and visual data analysis, leveraging the strength of both, human abilities and computational power. The results will be disseminated in publications and in the form of open source software, and accessible via this website.

Reproducibility Framework Concept

Software

We are developing a provenance tracking library for integration with web applications. The source code is available here, and a blog post is also available.

We are also working on a visualization tool to capture analysis intent using the provenance library discussed above. Find the code here, and a live-demo of the system at this page.

The following image illustrates the interface:

The predicting intent visualization user interface

Check out the two core paper for this project, on predicting intent and reusing workflows.

Publications

Kiran Gadhave, Zach Cutler, Alexander Lex
Persist: Persistent and Reusable Interactions in Computational Notebooks
Computer Graphics Forum (EuroVis), 2024

Devin Lange, Shaurya Sahai, Jeff M. Phillips, Alexander Lex
Ferret: Reviewing Tabular Datasets for Manipulation
Computer Graphics Forum (EuroVis), 2023

Haihan Lin, Derya Akbaba, Miriah Meyer, Alexander Lex
Data Hunches: Incorporating Personal Knowledge into Visualizations
IEEE Transactions on Visualization and Computer Graphics (VIS), 2022

Kiran Gadhave, Zach Cutler, Alexander Lex
Reusing Interactive Analysis Workflows
Computer Graphics Forum (EuroVis), 2022

Kiran Gadhave, Jochen Görtler, Zach Cutler, Carolina Nobre, Oliver Deussen, Miriah Meyer, Jeff Phillips, Alexander Lex
Predicting Intent Behind Selections in Scatterplot Visualizations
Information Visualization, 2021

Carolina Nobre, Dylan Wootton, Zach Cutler, Lane Harrison, Hanspeter Pfister, Alexander Lex
reVISit: Looking Under the Hood of Interactive Visualization Studies
SIGCHI Conference on Human Factors in Computing Systems (CHI), 2021

Zach Cutler, Kiran Gadhave, Alexander Lex
Trrack: A Library for Provenance-Tracking in Web-Based Visualizations
IEEE Visualization Conference (VIS), 2020

Katarina Furmanova, Samuel Gratzl, Holger Stitz, Thomas Zichner, Miroslava Jaresova, Alexander Lex, Marc Streit
Taggle: Scalable Visualization of Tabular Data through Aggregation
Information Visualization, 2019

Alex Bigelow, Carolina Nobre, Miriah Meyer, Alexander Lex
Origraph: Interactive Network Wrangling
IEEE Conference on Visual Analytics Science and Technology (VAST), 2019

Carolina Nobre, Marc Streit, Miriah Meyer, Alexander Lex
The State of the Art in Visualizing Multivariate Networks
Computer Graphics Forum (EuroVis), 2019

Carolina Nobre, Marc Streit, Alexander Lex
Juniper: A Tree+Table Approach to Multivariate Graph Visualization
IEEE Transactions on Visualization and Computer Graphics (InfoVis), 2019

Jen Rogers, Nicholas Spina, Ashley Neese, Rachel Hess, Darrel Brodke, Alexander Lex
Composer: Visual Cohort Analysis of Patient Outcomes
Applied Clinical Informatics, 2019

Jen Rogers, Nicholas Spina, Ashley Neese, Rachel Hess, Darrel Brodke, Alexander Lex
Composer: Visual Cohort Analysis of Patient Outcomes
Workshop on Visual Analytics in Healthcare at AMIA (VAHC 2018), 2018