Proteases are enzymes that hydrolyze protein peptide bonds at specific recognition sites in their substrates and play a central role in "life and death" processes, such as neural, endocrine and cardiovascular signalling, digestion, degrading misfolded or unwanted proteins, immunity, cell division, and apoptosis.
The key to our understanding the physiological role of a protease is to identify its natural substrates. Many proteases have the potential to cleave multiple proteins in different physiological compartments, with cleavage influenced by factors such as substrate sequence, substrate conformation and accessibility. Knowledge in regards to the substrate specificity of a protease can dramatically improve our ability to predict target protein substrates, however, this information can at present only be derived from experimental approaches. In the absence of such data, the targets of protease function cannot be deduced a priori from the structure or sequence of the protease. Solving the "substrate identification" problem is fundamental for both understanding protease biology and the development of therapeutics that target specific protease regulated pathways.
To address this problem, we developed PROSPER (PROtease Specificity
Prediction servER), an integrated feature-based server for the prediction of novel substrates and their cleavage sites of 24 different protease families from primary sequences. The PROSPER server utilizes a support vector regression and bi-profile Bayesian feature extraction approach to perform predictions using primary sequence and structural characteristics inferred or predicted from amino acid sequences. Features used for prediction include binary encoding amino acid sequences, predicted secondary structure, predicted solvent accessibility and predicted native disorder. In particular, the benchmarking results indicate that the use of predicted solvent accessibility and native disorder information significantly improves the prediction accuracy, thus enabling our method to outperform other state-of-the-art predictors. Based on these features, PROSPER offers important advantages over traditional substrate specificity prediction servers in its ability to identify novel substrates, and achieves greater coverage and accuracy than previous predictors. The PROSPER server is freely available at https://prosper.erc.monash.edu.au.
To our knowledge, PROSPER is the first comprehensive server that is capable of predicting substrate cleavage sites of multiple proteases within a single substrate sequence using machine learning techiniques. It is anticipated to be a valuable tool for in silico identification of protease cleavage sites.