Application of Persistence Homology to Mass Cytometry Data

Clustering is a data mining technique that partitions high dimensional datasets into more accessible subgroups while preserving the geometric properties of the data; however, complex interactions in biological systems present a challenge for accurate identification and analysis of single cell data. Improved strategies for the application of data clustering algorithms in biology have proven to be important in addressing the increasing number of biological parameters that can be measured in a single experiment. Mass cytometry allows for regular detection of ~40 biological parameters per experiment, and there is promise for increasing numbers of parameter detection in the future. Furthermore, imaging mass cytometry has seen increasing popularity in the portrayal of this data in that it allows for spatial data to be obtained. Robust and accurate assessment of both imaging mass cytometry data and mass cytometry data is key to advancing knowledge and uncovering benefits to society, be that in medicine, agriculture, economics, etc. In this project, we focus on the application of persistence homology to a well understood mass cytometry dataset in order to test the performance of topological data analysis compared to the performance of the generally accepted approach of clustering.