Evaluating clustering algorithms for genetic regulatory network structural inference

Abstract: Modern research increasingly recognises the importance of genome-wide gene regulatory network inference; however, a range of statistical, technological and biological factors make it a difficult and intractable problem. One approach that some research has used is to cluster the data and then infer a structural model of the clusters. When using this kind of approach it is very important to choose the clustering algorithm carefully. In this paper we explicitly analyse the attributes that make a clustering algorithm appropriate, and we also consider how to measure the quality of the identified clusters. Our analysis leads us to develop three novel cluster quality measures that are based on regulatory overlap. Using these measures we evaluate two modern candidate algorithms: FLAME, and KMART. Although FLAME was specifically developed for clustering gene expression profile data, we find that KMART is probably a better algorithm to use if the goal is to infer a structural model of the clusters.

  author = 	 {Christopher Fogelberg and Vasile Palade},
  title = 	 {Evaluating Clustering Algorithms for Genetic Regulatory Network Structural Inference},
  booktitle =    {Research and Development in Intelligent Systems XXVI: Proceedings of AI-2009, The Twenty-Ninth SGAI 
                  International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI2009)},
  pages = 	 {137--150},
  year = 	 2009,
  month = 	 {December},
  publisher =    {Springer-Verlag},
  address =      {Cambridge, UK},
  keywords =     {clustering, grn, inference, grn inference, large-network, further inference, cluster evaluation, 
                  partially supervised}

Available here.