Introduction to Omics

.title[
# Introduction to Omics
]
.author[
### <a href="https://shibalytics.com/">Max Qiu, PhD</a> Bioinformatician/Computational Biologist <a href="mailto:maxqiu@unl.edu" class="email">maxqiu@unl.edu</a>
]
.institute[
### <a href="https://biotech.unl.edu/bioinformatics">Bioinformatics Research Core Facility, Center for Biotechnology</a> <a href="https://ncibc.unl.edu/">Data Life Science Core, NCIBC</a>
]
.date[
### 02-06-2023
]

---

# Omics

* Omics **Informatically** refers to a field of study in biology ending in **-omics**
  + Subject of study is **"Ome"** 
  + Aims at the **collective characterization and quantification of pools of biological molecules** that translate into the structure, function, and dynamics of an organism(s)

???
Branches of science known **informatically** as Omics are various disciplines in biology whose names ends in the suffix -omics. "Ome" is used to address the objects of study of Omics, refers to a **totality** of some sort. Two parts: **analytical techniques and informatics** (instrumental and data analytical).

---

# Mass Spectrometry-generated Omics

.footnote[
[Gorrochategui et al, Trends in Analytical Chemistry, Volume 82, 2016, Pages 425-442](https://www.sciencedirect.com/science/article/pii/S0165993616300425)

]

???
This class is focusing on mass spec generated omics (from analytical methodology point of view), i.e., proteomics and metabolomics. Mass spectrometry has been one of the most **powerful research and analytical tool since the inception of proteomics and metabolomics as a science**. **Highly sensitive, precise, rapid, and selective with high dynamic range** has proven to be particularly effective and informative when combined with **separation techniques, or as part of a tandem experiment**. We can assume that mass spectrometry laid a foundation for bunch of omics sciences. Indeed, many of them took shape as sciences after the invention of highly effective methods of ionization and analysis of ions in the 1980s, among which were ESI and MALDI.

Peptidomics should be considered as a branch of proteomics, it's **the study of all possible biological properties of low-weight endogenous peptides**. Lipidomics is usually considered as a branch of metabolomics, focusing on lipids in the cell and their interaction with other metabolites, lipids, and proteins.

---

# Proteomics

[Nature](https://www.nature.com/subjects/proteomics):

Proteomics refers to **the study of proteomes**, but is also used to describe the **techniques** used to determine the entire set of proteins of an organism or system, such as protein purification and mass spectrometry.

[EMBL-EBI](https://www.ebi.ac.uk/training/online/courses/proteomics-an-introduction/what-is-proteomics/):

Proteomics is the **large-scale study of proteomes**.

[ScienceDirect](https://www.sciencedirect.com/topics/medicine-and-dentistry/proteomics):

Proteomics can be defined as “a **large-scale** study of protein properties, e.g., **expression level, posttranscriptional modification and protein interaction**, in order to obtain a global view of disease processes or cellular processes at the protein level.”

---

# "Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why."

Proteomics is used to investigate:

* when and where proteins are expressed

* rates of protein production, degradation, and steady-state abundance

* how proteins are modified (for example, post-translational modifications (PTMs) such as phosphorylation)

* the movement of proteins between subcellular compartments

* the involvement of proteins in metabolic pathways

* how proteins interact with one another
]

???

Proteomic experiments generally **collect data on three properties of proteins** in a sample: **location, abundance/turnover and post-translational modifications**. Depending on the experimental design, researchers may be directly interested in these data, or may use them to infer additional information. For example, it may be possible to infer a protein’s interaction partners, or to assess whether a protein is active or inhibited from its post-translational modifications.

---

# "Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why."

Proteomics can provide significant biological information for many biological problems, such as:

* which proteins interact with a particular protein of interest (for example, the tumor suppressor protein p53)?

* which proteins are localized to a subcellular compartment (for example, the mitochondrion)?

* which proteins are involved in a biological process (for example, circadian rhythm)?

]

???
Examples of proteomics application:

---

# "Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why."

### Proteome

The proteome is **not constant**; it differs from cell to cell and changes over time. To some degree, the proteome **reflects the underlying transcriptome**. However, **protein activity** (often assessed by the reaction rate of the processes in which the protein is involved) **is also modulated by many factors** in addition to the expression level of the relevant gene.
]

---

# Methods in proteomics

[Gregorich ZR, Chang YH, Ge Y. Pflugers Arch. 2014 Jun; 466(6):1199-209. ](https://doi.org/10.1007/s00424-014-1471-9)
]

]

[Catherman AD, Skinner OS, Kelleher NL. Biochem Biophys Res Commun. 2014 Mar 21;445(4):683-93. ](https://doi.org/10.1016/j.bbrc.2014.02.041)
]

]

???

The two principal approaches to identifying and characterizing proteins using MS are the “bottom-up”, which analyze **peptides by proteolytic digestion**, and “top-down”, which analyze **intact proteins**.

In “bottom-up" proteomics, proteins are **hydrolyzed by proteases** (trypsinolysis, digestion), and sets of the resulting peptides are sequenced using LC-MS. Usually gel separation before proteolytic digestion.

**Shotgun proteomics**: no initial gel separation after protein extraction, the protease hydrolyzes the entire mixture of extracted proteins.

In the top-down version, a mixture of intact proteins, without preliminary separation or proteolysis, is fed into an mass spectrometer operating in a tandem mode.

---

# [Comparison of bottom-up and top-down proteomics](https://www.creative-proteomics.com/pdf/untargeted-vs-targeted-metabolomics.pdf)

|                             |Bottom.up                                                                                |Top.down                                                                                            |Peptidomics         |
|:----------------------------|:----------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:-------------------|
|Starting point               |Protein extraction                                                                       |Protein extraction                                                                                  |Peptide extraction  |
|Gel separation               |Possibly. If no, then shotgun proteomics                                                 |Possibly                                                                                            |No                  |
|Proteolytic digestion        |YES                                                                                      |NO                                                                                                  |NO                  |
|Analytical platform          |LC-MS                                                                                    |LC-MS                                                                                               |LC-MS               |
|What is being analyzed by MS |Peptides (proteolytic digestion of proteins)                                             |Intact proteins                                                                                     |Endogenous peptides |
|Limits                       |Limited sequence coverage results in loss of information about PTMs and protein isoform. |High cost; Low efficiency; Complex spectra; Only small proteins; Not enough bioinformatics support. |                    |

???
Bottom-up proteomics is the most mature and most widely used approach for protein identification and characterization. It doesn’t need sophisticated instrumentation and expertise. And it has the ability to achieve high-resolution separations. A consequence of the limited sequence coverage in bottom-up proteomics is the **loss of many information about PTMs**, and **the complexity of “assembling” the initial peptide from its MS/MS fragments**, **protein identification from very complex peptide mixtures**.

The two major advantages of this strategy are the potential access to the **complete protein sequence and the ability to locate and characterize PTMs**. Limitations include **very complex spectra generated by multiply charged proteins**; top-down approach does not work well with intact proteins larger than about 50 kDa; protein disassociation is poorly understood compared to peptide disassociation and bioinformatics tools for top-down are less evolved than bottom-up.

---

# Metabolomics

[Nature](https://www.nature.com/subjects/metabolomics):

Metabolomics refers to the systematic identification and quantification of the **small molecule metabolic products** (the **metabolome**) of a biological system (Cell, tissue, organ, biological fluid, or organism) at a **specific point in time**.

[EMBL-EBI](https://www.ebi.ac.uk/training/online/courses/metabolomics-introduction/what-is/):

Metabolomics is the **large-scale study** of **small molecules**, commonly known as metabolites, within cells, biofluids, tissues or organisms. Collectively, these small molecules and their interactions within a biological system are known as the **metabolome**.

[ScienceDirect](https://www.sciencedirect.com/topics/medicine-and-dentistry/metabolomics):

Metabolomics is defined as the systematic study of all chemical processes concerning metabolites, providing characteristic chemical fingerprints that specific cellular processes yield, by means of the study of their **small-molecule metabolite** profiles.

[Science](https://www.science.org/content/article/big-data-big-picture-metabolomics-meets-systems-biology):

Metabolomics—the study of the **collection of an organism's metabolites**—provides a molecular measurement of **phenotype**, or the characteristics resulting from the **genotype's interaction with the environment**.

???
Metabolomics is regarded as the younger sibling of the omics sciences, after genomics, transcriptomics and proteomics. But the study of metabolites and metabolism has actually been performed for more than 100 years. The metabolomics field is **strongly founded within the "realm of biochemistry",** that is the **study of chemical process within living organisms**.

---

# Metabolomics and Metabolome

### Metabolomics

* Attempts to measure all of the (small molecules) metabolites (metabolome)
* High-throughput (large-scale)
* Reflect phenotype: genotype interaction with the environment
* Snapshot (a specific time)
]

???

Metabolomics is a non-biased experimental approach that attempts to measure all of the metabolites in a biological sample. It is a **high-throughput** approach to **characterize and quantify the metabolome** present in a system or physiological state. And it reflects the characteristics of **phenotype, which is genotype's interaction with the environment**.

--
.pull-right[

### Metabolic profiling (basically same thing)

* No one analytical method can identify all the metabolites. 
  + Combination of analytical approaches to maximize the number of metabolites and increase coverage. 
* Detection of a wide range of metabolites
]

???

Although the approach is designed to analyze all metabolites, **no single analytical technique or even combination of analytical techniques can detect all of the metabolites present in a complex sample**, therefore some groups defined the approach as **metabolic profiling**, which is simply to shoot for the detection of a wide range of metabolites. (More further)

---

# Metabolomics and Metabolome

### Metabolomics

* Attempts to measure all of the (small molecules) metabolites (metabolome)
* High-throughput (large-scale)
* Reflect phenotype: genotype interaction with the environment
* Snapshot

### Metabolites
* Low molecular weight biochemical (< 1.5 kDa), including
 + carbohydrates, amino acids, organic acids, nucleotides, lipids ...

]

### Metabolic profiling (basically same thing)

???

Metabolites are low molecular weight biochemicals. Generally speaking is small molecules that are less than 1.5 kDa.

Metabolites **inhabit a diverse chemical space**. And that includes carbohydrates, amino acids, organic acids, nucleotides, and lipids.

---

# Metabolomics and Metabolome

### Metabolomics

* Attempts to measure all of the (small molecules) metabolites (metabolome)
* High-throughput (large-scale)
* Reflect phenotype: genotype interaction with the environment
* Snapshot

### Metabolites
* Low molecular weight biochemical (< 1.5 kDa), including
 + carbohydrates, amino acids, organic acids, nucleotides, lipids ...

]

### Metabolic profiling (basically same thing)

### Metabolome

* Entire qualitative collection of metabolites in a biological sample
]

???

The entire qualitative collection of metabolites in a biological sample is called the metabolome. The human body contains many differnt types of metabolomes, **representing different biofluids and tissues**, and each of these metabolomes is unique in which metabolites are present and the concentrations of each metabolite.

---

# Methods in metabolomics

.footnote[
[San-Martin, Breno Sena De et al. Archives of Endocrinology and Metabolism (online, accessed 3 Janurary 2023) ](https://doi.org/10.20945/2359-3997000000300)

]

???
Metabolomics can be divided into non-targeted and targeted metabolomics, both seek to find differences in the biological system's metabolome, but **target metabolomics is carried out by focusing on one, or a few specific metabolites, while untargeted metabolomics focuses on a broad variety of metabolites**.

Non-targeted metabolomics can analyze metabolites **comprehensively and systemically**. It is an **unbiased** metabolomics analysis that can **discover new biomarkers**. Targeted metabolomics is the study and analysis of specific metabolites.

Both have their own advantages and disadvantages, and are often used in combination for the discovery and accurate weight determination of differential metabolites, and in-depth research and analysis of subsequent metabolic molecular markers.

---

# [Comparison of untargeted and targeted metabolomics](https://www.creative-proteomics.com/pdf/untargeted-vs-targeted-metabolomics.pdf)

|                   |Untargeted                                                                                                                                             |Targeted                                                                                                         |
|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------|
|Objective          |Discovery; Hypothesis generating; Qualitative identification; Relative quantification; > 1000s metabolites measured; No metabolite standards required. |Validation; Hypothesis testing; Absolute quantification; ~ 20 metabolites measured; Metabolite standards needed. |
|Sample preparation |Global metabolite extraction                                                                                                                           |Extraction protocol for spedific metabolites.                                                                    |
|Data acquisition   |Data independent acquisition (DIA)                                                                                                                     |Data dependent acquisition (DDA): Multiple reaction monitoring (MRM).                                            |
|Data preprocessing |Noise filtering, retention time correction, peak detection, chromatogram alignment, unknown feature/metabolite identification                          |                                                                                                                 |
|Data processing    |Data integrity check; IS normalization; Compound identification.                                                                                       |Absolute quantitation of metabolite concentrations                                                               |

???

Through untargeted approach, we can **simultaneously measure hundreds to thousands metabolites by combining multiple analytical methods to increase metabolite coverage** and analyze **thousands of different samples in a short amount of time**, and then use a combination of **univariate and multivariate statistical analysis** to interrogate that data and to determine the significant metabolic changes between different experimental conditions. Ultimately we can integrate data to acquire new biological inferences.

---

# Metabolome vs Proteome

Proteome

+ **Linking genotype to phenotype**
  + Reflecting underlying transcritome
  + Modulated by many genome-unrelated factors (in forms of PTMs)
  
**Metabolome provides the closest link to the phenotype of an organism**
  + Often end products of biochemical process
  + Fast turnover (snapshot)
  + Easier to detect changes (amplified compared to other upstream omics)
  + Sensitive to endogenous and exogenous stimuli
  + Can reveal transient changes closely aligned with disease state of a system

]

.footnote[
[Amer and Baidoo, Front. Bioeng. Biotechnol., 23 February 2021](https://doi.org/10.3389/fbioe.2021.613307)

]

???

* Metabolites are transferable between different biological systems; A metabolite is the same compound from different biological system.
* Metabolism is highly conserved across biology; Enable the use of model organism.
* Non-invasive and high-throughput

---

# Future of Omics: Integrated Omics

Integrated Omics - Integrate two or more Omics datasets

* The study of biological interactions between components in a system can be investigated at a single functional level or in different functional levels

* Study the components and their interactions in a **holistic systematic** approach rather than a reductionist approach

* Objective - From *individual omics dataset* To *biologically meaningful context*

* Goal - **System Biology**
]

![](data:image/png;base64,#./img/workflow_integrated_omics.jpeg)
]

.footnote[
[Biswapriya B Misra et al. J Mol Endocrinol. 2018 Jul 13:JME-18-0055.](https://jme.bioscientifica.com/view/journals/jme/62/1/JME-18-0055.xml)
]

???

**Metabolomics only investigates from one functional level**, so are genomics, transcriptomics, and proteomics. If we can **combine all different functional levels, we can investigate from a "holistic and systematic" approach**. It's like looking at an elephant from all the angles instead of just one.

Ultimately, our goal is system biology.

---

background-image: url("data:image/png;base64,#./img/challenges_integrated_omics.jpg")
background-size: 65%

.footnote[
[Biswapriya B Misra et al. J Mol Endocrinol. 2018 Jul 13:JME-18-0055.](https://jme.bioscientifica.com/view/journals/jme/62/1/JME-18-0055.xml)
]

???
Data scaling & reduction – **order of magnitude difference** in dataset dimensions between genomics and metabolomics.

Variances among samples across omics – **large and sparse**, rendering cluster analysis uninformative.

**The major bottleneck in omics is becoming less about our ability to generate data and more about our ability to process, analyze, interpret and store the data. **

Welcome to the age of BIG DATA.

---

background-image: url(data:image/png;base64,#https://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs13024-018-0304-2/MediaObjects/13024_2018_304_Fig1_HTML.png?as=webp)
background-size: 75%

# MS Omics Workflow

.footnote[
[Shao, Y., Le, W. Mol Neurodegeneration 14, 3 (2019)](https://doi.org/10.1186/s13024-018-0304-2)

]

???

This is a general workflow for a MS omics study and we will follow this workflow and discuss every step in the next two weeks of this class.

If you are sitting here, you should already know the general steps a scientific process. And this workflow just expands a general scientific process with more detailed steps in the context of an Omics study.

Hopefully at the end of this two weeks you'll have a clear impression as to how to design and conduct a whole Omics study, and what to pay attention to in each step. Today we will talk about the first step, formulate a valid study question.

It sounds trivial, but what I want to point out here is **the type of your study question**, is it a **discovery phase** question or a **validation phase** question? Because that changes everything follows, starting from experiment design.

---

# Study question

#### Study question will influence the experiment design. 
* Define the study question clearly ahead of time
* **WILL** influence analytical approaches.

**Hypothesis-generating studies** (Discovery)
* **Untargeted metabolomics and shotgun proteomics**: maximize the number of metabolites or proteins detected
* Identify the statistically significant features and constructed the hypothesis

]

]

???
Study question will influence the experimental design, therefore define the study question clearly ahead of time.

A hypothesis-generating study will often be the **first phase (stage 1 of a study)**. Hypothesis-generating study measures a **wide diversity of metabolites or proteins**, hundreds to thousands of them, so we can investigate **global changes** in the metabolome or proteome. **The objective is usually to determine one or a group of features that changes as a result of a perturbation** to a biological system.

---

# Study question

#### Study question will influence the experiment design. 
* Define the study question clearly ahead of time
* **WILL** influence analytical approaches.

Hypothesis-generating studies (Discovery)
* Untargeted metabolomics and shotgun proteomics: maximize the number of metabolites or proteins detected
* Identify the statistically significant features and constructed the hypothesis

]

**Hypothesis-testing studies** (Validation)
* Based on **biological context**
  + Biologically perturb the system: knock-out or enhance a reaction
  + Different functional level: post-translational modification; enzyme activity

]

???

A hypothesis-generating study will often be the first phase (stage 1 of a study), which is then followed by **one or more hypothesis-testing studies to validate the discoveries** from the first stage.

**How we design validation study is hugely dependent on the biological context**. For example, what do we do if we **have identified a metabolic reaction** that is important in a biological mechanism. Well, we can test this hypothesis biologically **by perturbing the system**. The perturbation could be to **knock out or enhance the metabolic reaction** and measure changes in the phenotype.

We could also measure changes at **different functional levels**, for example, investigate **whether a change in a metabolic reaction is a consequence of changes at the proteomic level**.

---

# Study question

#### Study question will influence the experiment design. 
* Define the study question clearly ahead of time
* **WILL** influence analytical approaches.

]

**Hypothesis-testing studies** (Validation)
* Based on biological context
  + Biologically perturb the system: knock-out or enhance a reaction
  + Different functional level: post-translational modification; enzyme activity
* **Targeted**: precise and accurate in the appropriate biological matrix

]

???
In a validation study, the objective is no longer to measure as much features as we can. In validation phase, the features of interest **are known**, we can apply a **targeted** analytical method to detect **only the features of interest**, and the **analytical method should be precise and accurate** in identifying the list of features in the appropriate biological matrix.

---

# Study question

#### Study question will influence the experiment design. 
* Define the study question clearly ahead of time
* **WILL** influence analytical approaches.

]

]

???

**One validation step may not be sufficient to fully validate the results**. The route from discovery to validation may have to repeat. For example, **the validation of a biomarker to apply in clinical practice** will require a a greater level of validation and take longer to complete than **validation of a biological mechanism in yeast**.

---

# Study question

#### Study question will influence the experiment design. 
* Define the study question clearly ahead of time
* **WILL** influence analytical approaches.

]

Hypothesis-testing studies (Validation)
* Based on biological context
  + Biologically perturb the system: knock-out or enhance a metabolic reaction
  + Different functional level: post-translational modification; enzyme activity
* Targeted: precise and accurate in the appropriate biological matrix

**Translation**
* Translate your discovery to the relative working environment

]

???

A final step of a study, following discovery and validation, is to **translate your discovery to the relative working environment**. Validation and translation can take many years to complete. This is the final output of your study. The translation phase is very important, it is the output of our research to benefit the human population or your area of study, and provides the impact of our research.

---

# Next: Experimental design, Sample Collection and Preparation

Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).

---

# Resources:

* [Data carpentry](https://datacarpentry.org/lessons/)
  + [Data Analysis and Visualization in R](https://datacarpentry.org/R-genomics/index.html)
  + [Data Analysis and Visualization in R for Ecologists](https://datacarpentry.org/R-ecology-lesson/index.html)

* [Software carpentry](https://software-carpentry.org/lessons/)
  + [Programming with R](http://swcarpentry.github.io/r-novice-inflammation/)
  + [R for Reproducible Scientific Analysis](https://swcarpentry.github.io/r-novice-gapminder/)