Genome wide studies estimated that at least 90% of human genes undergo some degree of alternative splicing, which is tightly regulated in different tissues and developmental stages. Therefore the disruption of splicing regulation is a common cause of human diseases such as cancer. Alternative splicing is generally controlled by various trans-acting splicing factors that specifically bind cis-elements in pre-mRNA to promote or suppress splicing reactions. Typical splicing factors have a modular domain configuration, containing one or several RNA binding domains to specifically recognize pre-mRNA targets and the functional domain(s) to control splicing. While human genome contains hundreds of RNA binding proteins with potential roles in regulating splicing, current knowledge on splicing factor activities are mainly based on studies of several canonical splicing factors, such as SR protein family and hnRNP family. A deep understanding of the splicing regulatory activity in all RNA binding proteins will provide a basis for scientists to further study and synthesize splicing factors with specific activity.
Recently, Zefeng Wang’s group in the CAS-MPG Partner Institute of Computational Biology (PICB) has published a research article “Modeling and predicting the activities of trans-acting splicing factors with machine learning” in Cell Systems (online publication on Nov 7th). In this study, the researchers developed a machine learning approach to classify and predict the activities of RNA binding proteins (RBPs) and revealed the association between RBP sequence compositions and their activities in regulating splicing, enabling de novo engineering of artificial splicing factors.
It is previously known that many RNA-binding proteins contain a large number of low-complexity regions, some of this low complexity fragment can affect splicing. Based on the phenomenon, they conducted a systematic survey of the low-complexity regions in RNA binding proteins for the splicing regulatory activities using an engineered splicing factor system (up to 12 representative low-complex regions). They further use the survey results as a training dataset and use machine learning approach to learn the hidden rules on how the protein sequences determine their activity. Such approaches led to a predictive model for splicing regulatory activity of peptides. With this framework, they discovered new splicing factors with sequence features that have never been reported. Based on these sequence features, they achieved first de novo synthesis of the artificial splicing factor with customized activity with a very high success rate. These findings also pave the way to the development of gene therapy methods based on artificial splicing factors.
This work was mainly carried out by Miaowei Mao, from East China University of Science and Technology (ECUST) and now a postdoctoral fellow at CAS-MPG Partner Institute of Computational Biology (PICB), Yue Hu, from PICB, under the guidance of Dr. Wang Zefeng (PICB). Professor Yi Yang (ECUST) and senior investigator Xiaoling Li from the National Institute of Environmental Health Sciences (NIEHS/NIH) also participated the work.
This work was supported by National Natural Science Foundation of China, Science and Technology Commission of Shanghai Municipality, and the China Scholarship Council, etc.
RNA-binding proteins regulate pre-mRNA alternative splicing