Skip to main content
Fig. 8 | BMC Systems Biology

Fig. 8

From: Systems biology of the structural proteome

Fig. 8

In a, K-means clustering of all E. coli and T. maritima protein structural properties (29 features, including SASA, percent polar, nonpolar, buried, surface, charged residues and others). The K-means clustering algorithm clusters all proteins into four distinct clusters (based on the percent variance explained per cluster using the elbow method, see Additional file 1). Interestingly, metabolic subsystems in E. coli show distinct structural characteristics in their respective proteins. The subsystem with the most proteins in a given cluster is reported. In b, we report the main structural characteristics that distinguish proteins across clusters. The numbers represent averaged scaled property values across all proteins within a given cluster (see Additional file 1). The property values generally represent the percentage of the protein that is described by a given property (e.g., percentage of the protein which is nonpolar). In c, the percentage of E. coli and T. maritima proteomes within each cluster are shown. Surprisingly, certain clusters are enriched in E. coli proteins (cluster 0) and certain in T. maritima proteins (cluster 2). Total numbers of proteins in each cluster are 154, 318, 592, and 763 for cluster 0–4, respectively. In d, an example of a homolog (pgk) which is present in entirely different clusters (cluster 2 for E. coli and cluster 1 for T. maritima). The structural differences can mainly be explained by the fact that in T. maritima, pgk (PDB 1VPE) is fused with tpi (PDB 1B9B), creating a protein which is triple in length to that of its E. coli counterpart (PDB entry 1ZMR)

Back to article page