8. Proteomics
8.5. Applications and limitations
Shot-gun proteomics can be used to investigate anything from responses to pesticides, pathogens, nutritional stress, aging, and an endless array of other conditions (Arad et al., 2024). Proteomics has even been used to discover specific protein markers suitable for guiding selective breeding for varroa resistance mechanisms (Guarna et al., 2017). While LFQ proteomics is best suited to compare conditions across which the majority of proteins can be assumed to be expressed at the same abundance, with a smaller fraction of the proteome changing in response to a stimulus, meaningful results can be obtained from experiments with more dramatic proteomic shifts, such as between castes and across developmental stages. LFQ has been demonstrated to achieve accurate relative quantification even when approximately one-third of the proteome is changing in abundance (Cox et al., 2014).
Although proteomics is an increasingly powerful technique, interpretation of the results can be challenging owing to limited functional annotation of the honey bee proteome (Elsik et al., 2018). Each protein can have one or more biological functions, which are associated with unique GO terms (Ashburner et al., 2000), to help derive biological meaning from the hundreds or thousands of proteins that are often differentially expressed in proteomics experiments. According to Hymenopteramine, a database of genomic resources for hymenopterans, only 7,929 out of 15,314 sequences (52%) in the honey bee official gene set (v3.2) were linked to GO terms as of 2018 (Elsik et al., 2018), meaning that the remaining sequences have poorly characterized functions. This figure has since increased (Walsh et al., 2022), but many uncharacterized genes remain. Our limited understanding of honey bee gene and protein functions means that high-throughput datasets can be difficult to interpret, as we are blind to the roles of a large fraction of the very targets we are analyzing.
Because shot-gun proteomics requires protein digestion into peptides, and peptide sequences can be shared between different proteins, it is often difficult to say definitively to which protein the peptides belong. MaxQuant deals with this problem by reporting “protein groups,” which offer the most parsimonious explanatory proteins likely to be present in the sample, rather than individual proteins. However, this also complicates GO term assignment: Since multiple different proteins can be listed in a single protein group, which GO terms should the protein group be given? A simple heuristic, though imperfect, is to assign a protein group with the GO terms associated with its leading protein. Since both GO terms and protein groups are defined based on sequence similarities, it is a reasonable assumption that proteins within a group will share GO terms. Alternatively, though more laboriously, GO terms associated with all proteins in the group can be linked.
Unfortunately, for those proteins which are poorly characterized, it is difficult to generate functional information without a high-throughput way to generate gene knock-out (a gene is deleted or rendered nonfunctional) or knock-in (a gene is inserted) mutant organisms. Organisms such as Drosophila melanogaster and Mus musculus have benefitted from decades of detailed genetic and biochemical research into specific genes and proteins, but this is only recently possible for honey bees and is still far from routine (Kohno & Kubo, 2019). While much information can be borrowed from what is known about homologous proteins in other species, honey bees diverged from flies about 300 million years ago (Honeybee Genome Sequencing Consortium, 2006) and therefore have experienced considerable sequence divergence. Until we know more about the functions of all honey bee proteins, we will not be able to interpret high-throughput differential expression data to its full potential.