Ferret: Reviewing Tabular Datasets for Manipulation

Ferret screenshot

Abstract

How do we ensure the veracity of science? The act of manipulating or fabricating scientific data has led to many high-profile fraud cases and retractions. Detecting manipulated data, however, is a challenging and time-consuming endeavor. Automated detection methods are limited due to the diversity of data types and manipulation techniques. Furthermore, patterns automatically flagged as suspicious can have reasonable explanations. Instead, we propose a nuanced approach where experts analyze tabular datasets, e.g., as part of the peer-review process, using a guided, interactive visualization approach. In this paper, we present an analysis of how manipulated datasets are created and the artifacts these techniques generate. Based on these findings, we propose a suite of visualization methods to surface potential irregularities. We have implemented these methods in Ferret, a visualization tool for data forensics work. Ferret makes potential data issues salient and provides guidance on spotting signs of tampering and differentiating them from truthful data.

Citation

Devin Lange, Shaurya Sahai, Jeff M. Phillips, Alexander Lex
Ferret: Reviewing Tabular Datasets for Manipulation
Computer Graphics Forum (EuroVis), 42(3): 187-198, doi:10.1111/cgf.14822, 2023.

BibTeX

@article{2023_eurovis_ferret,
  title = {Ferret: Reviewing Tabular Datasets for Manipulation},
  author = {Devin Lange and Shaurya Sahai and Jeff M. Phillips and Alexander Lex},
  journal = {Computer Graphics Forum (EuroVis)},
  doi = {10.1111/cgf.14822},
  volume = {42},
  number = {3},
  pages = {187-198},
  year = {2023}
}

Acknowledgements

We wish to thank Holger Stitz, Michael PĆ¼hringer, and the LineUp authors for their support using the library, the Retraction Watch Project for access to their database, Zach Cutler and Jack Wilburn for technical help, the interview participants for their time and expertise, and the Visualization Design Lab for feedback. This work was supported by NSF IIS 1751238 and CCF-2115677.

Images

These images are not part of the original paper and licensed using CC BY 4.0. If you use these images, please cite the paper. Click on the images for full resolution.