A routine approach to inferring functions for a gene set is by using function enrichment analysis based on Gene ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG) curated terms and pathways. However, such analysis requires the existence of overlapping genes between the query gene set and those annotated by GO/KEGG. Furthermore, GO/KEGG databases only maintain a very restricted vocabulary. Here, we developed an algorithm called "CoCiter" (implemented in this web service) based on literature co-citations to address these disadvantages in conventional gene set and gene function analysis. Co-citation analysis is widely used in ranking articles and predicting protein-protein interactions (PPIs). Our algorithm can further assess the co-citation significance of a gene set with any other user defined gene sets, or to any free terms.
CoCiter is able to analyze the significance of its co-citation with two types of genes or terms:
(1) Any pre-defined/manually-curated gene sets (e.g. gene sets from GO/KEGG);
(2) Any user-defined, free term sets (e.g. "diabetes" or "leukemia").