Information theory and signal processing methodology to identify nucleic acid-protein binding sequences in RNA-protein interactions
RNA binding proteins are known to modulate an impressive array of cellular processes. Recent studies have focused on a variety of techniques to analyze RNA-protein (RBP) complex formation including NMR, X-ray crystallography, and mass spectrometry. To explore the factors that regulate RBP formation, we developed a computational method as a step prior to biochemical validation of RBP by mass spectrometry. Here we describe a methodology to predict the sequences involved in RNA-protein complex formation including transient interactions. The approach is based on an information entropy-based algorithm calibrated against known ΔG and binding probabilities for RNA nucleotides-amino acid residues. The method is then used to predict binding sites of specific RNA associated proteins identified by mass spectroscopy of RNA associated proteins. The estimates of specific nucleotide peptide interactions was based on the Gibbs free energy of nucleotide-peptide fragments in a given RBP complex, and a dynamic model that uses multiple binding sites within a nucleotide-peptide fragment to quantify the binding affinity of weak and transient RNA-protein interactions. A concept originally described by Claude Shannon is now being used to foster a new paradigm for assisting in the search for specific RNA-protein binding sites. In this paper we will detail the following information, in order: 1. An information-theoretic based approach to modelling RNA-protein interactions down to specific RNA-protein complex motifs based upon information entropy; 2. The theory applied to a calibration dataset of known RNA-protein interactions to predict the RNA-protein binding motifs; 3. A prediction of RNA-protein binding motifs on a set of co-immunoprecipitation assays.
Shaw, Harry; Pattabiraman, Nagarajan; Preston, Deborah; Ammosova, Tatiana; Obukhov, Yuri; Nekhai, Sergei; and Kumar, Ajit, "Information theory and signal processing methodology to identify nucleic acid-protein binding sequences in RNA-protein interactions" (2019). College of Medicine Faculty Publications. 323.