Abstract
The degree of correlation between variables is used in many data analysis applications as a key measure of interdependence. The most common techniques for exploratory analysis of pairwise correlation in multivariate datasets, like scatterplot matrices and clustered heatmaps, however, do not scale well to large datasets, either computationally or visually. We present a new visualization that is capable of encoding pairwise correlation between hundreds of thousands variables, called the s-CorrPlot. The s-CorrPlot encodes correlation spatially between variables as points on scatterplot using the geometric structure underlying Pearson's correlation. Furthermore, we extend the s-CorrPlot with interactive techniques that enable animation of the scatterplot to new projections of the correlation space, as illustrated in the companion video above. We provide the s-CorrPlot as an open-source R-package and validate its effectiveness through a variety of methods including a case study with a biology collaborator.
Citation
Sean McKenna,
Miriah Meyer,
Christopher Gregg,
Samuel Gerber
s-CorrPlot: An Interactive Scatterplot for Exploring Correlation
Journal of Computational and Graphical Statistics, 25(2): 445--463, doi:10.1080/10618600.2015.1021926, 2016.
BibTeX
@article{2015_jcgs_s-corrplot, title = {s-CorrPlot: An Interactive Scatterplot for Exploring Correlation}, author = {Sean McKenna and Miriah Meyer and Christopher Gregg and Samuel Gerber}, journal = {Journal of Computational and Graphical Statistics}, doi = {10.1080/10618600.2015.1021926}, volume = {25}, number = {2}, pages = {445--463}, year = {2016} }