---
title: "amanida_UserGuide"
subtitle: "`r paste0('v',packageVersion('amanida'))`"
author: "Maria Llambrich (maria.llambrich@urv.cat)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{amanida_UserGuide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(123)
```

This vignette illustrates `Amanida` R package, which contains a collection of functions for computing a weighted meta-analysis in R using significance, relative change and study size. It explains how to conduct the meta-analysis, visualize the results and draw conclusions. 

## Introduction

The widespread of metabolomics as a potential tool for clinical diagno-sis has increased the systematic reviews and meta-analysis on this topic. Meta-analysis is the statistical combination in a single estimate for results from primary studies answering the same question, which is a common practice in medical research. When raw data is not available to perform a meta-analysis, there are different approaches that can be applied but them require the standard deviation, that is used for effect size estimate calculation and weighted methods. Nowadays, there isn’t a tool to perform a weighted meta-analysis which could be applied to a dataset of overall results without the standard deviation. Meta-analysis tools base the study of effect size using the difference of means, but in case of metabolomics we found that the comparison between groups is disclosed using relative change. The purpose of amanida is to perform a weighted meta-analysis combining overall results based on statistical significance, relative change and study size. In Amanida, statistical significance and relative change are combined using weighted methods which do no require standard deviation, weighted Fisher for p-values and weighted average for fold-change, where the weight comes from study size. 

Furthermore, Amanida also computes qualitative meta-analysis performing a vote-counting for compounds, including the option of only using identifier and trend labels.

We have included an automatized id harmonization step based in *Villalba H, Llambrich M, Gumà J, Brezmes J, Cumeras R. A Metabolites Merging Strategy (MMS): Harmonization to Enable Studies’ Intercomparison. Metabolites. 2023; 13(12):1167. https://doi.org/10.3390/metabo13121167*. Briefly, all ids are converted to a unique one, in this case the PubChem ID and then checked for duplicates. It also allows to retrieve multiple descriptors such as Molecular Formula, Molecular Weight or InChIKey. 

### Installation

`Amanida` can be installed from `CRAN` repository, by:

```{r install,eval=FALSE}
install.packages("amanida")
```

Then loaded by:

```{r}
library(amanida)
```

## Amanida meta-analysis: a tutorial

Here we show how to conduct a meta-analysis using Amanida approach. We are using a dataset obtained in systematic review and meta-analysis: *Comprehensive Volatilome and Metabolome Signatures of Colorectal Cancer in Urine: A Systematic Review and Meta-Analysis* Mallafré et al. Cancers 2021, 13(11), 2534; https://doi.org/10.3390/cancers13112534.

Before using `Amanida` is important the curation of the identifiers. The objective of a meta-analysis is to combine the different results of a same feature, therefore, it is important that all identifiers are in the same format.

### Import data

Dataset to analyse must include the following columns: identifier, p-value, fold-change, study size (N) and reference.

Once we have the dataset in a supported format (xls/xlsx, csv or txt), we import as follows:

```{r}
coln = c("Compound Name", "P-value", "Fold-change", "N total", "References")
input_file <- getsampleDB()
datafile <- amanida_read(input_file, mode = "quan", coln, separator=";")
```

It creates a tibble which contain the data needed to perform the amanida meta-analysis. 
**Note**: missing data is ignored and negative values of fold-change are transformed to positive (1/value).

Before the meta-analysis the IDs can be checked using public databases information. The IDs in format chemical name, InChI, InChIKey, and SMILES are searched in PubChem to transform all into a common nomenclature, PubChem ID. Then duplicates are searched. 

```{r, eval = F}
datafile <- check_names(datafile)
```


### Analysis

In this step will be performed the combination of overall results weightening by study size. Methods for combine results are:

* P-value: weighted p-values combination, which is a variant of Fisher’s method. A gamma distribution is used to assign non-integral weights proportional to study size to each p-value.

* Fold-change: logarithmic transformation (base 2) and then averaged with weighting by study size. 


**Note**: Only with the object created by `amanida_read` function in `mode = "quan"` are able to compute the amanida meta-analysis. 

It is also included a qualitative analysis, the vote-counting, that is computed by the sum of votes assigned as follows: votes are +1 for up-regulation, -1 for down-regulation and 0 if no trend.

```{r}
amanida_result <- compute_amanida(datafile, comp.inf = F)
```

If you select the option `comp.inf = T` the package will retrieve the PubChem ID from the ID using `webchem`. Results will return the following information: PubChem ID, Molecular Formula, Molecular Weight, SMILES, InChIKey, KEGG, ChEBI, HMDB, Drugbank.

### Results and visualization

Combination results are disclosed in two tables

* meta-analysis results acces by `amanida_result@stat`
* vote-counting results acces by `amanida_results@vote`

#### Quantitative results

Amanida meta-analysis gives the combination for p-value and fold-change for each feature.

Trend variable correspond to the direrction of the combination, 1 means that the feature is increased in case group and -1 the opposite. N_total is the sum of the study sizes for that feature, which are disclosed in reference variable. 

```{r}
head(amanida_result@stat)
```

To observe the results of meta-analysis graphically is done with a volcano plot, where the log10(p-values) are plotted against the log2(fold-change). The cut-offs can be selected by the user, in case of fold-change we recommend values higher than 2, where it is considered to have biological meaningfulness. 

**Note**: cut-off have to be indicated without log scale.

```{r, fig.width = 7, fig.height = 9}
volcano_plot(amanida_result, cutoff = c(0.05,3.5))
```

We will consider that a feature have statistical significance if it is over the cut-offs threshold (dashed lines). 

#### Qualitative results

Optionally we van observe the qualitative results. A bar plot shows the result of vote-counting.

**Note:** output is restricted to 30 compounds to facilitate the readability. To visualize less compounds increase `counts` parameter. 

```{r, fig.width = 5, fig.height = 6}
vote_plot(amanida_result, counts = 1)
```

**Note**: The output include compound identifiers from public repositories such as PubChem or the Human Metabolite DataBase, if compounds are not used this could be avoided with `comp.inf = F`.

With vote plot discrepancies in compounds behaviour are not detected at first glance, and we suggest to combine the results with the explore plot. Discrepancies in results are shown when `type = "mix"` is used:

* counts parameter: this value indicate the minimum votes to show the results. 

**Note:** output is restricted to 25 compounds to facilitate the readability. To visualize less compounds increase `counts` parameter. 

```{r, fig.width = 6.5, fig.height = 8.5}
explore_plot(sample_data, type = "mix", counts = 1)
```

In sample data we observe some compounds over the cut-off threshold, then we check the consistency of result. In this case, compounds with more than one report and statistical significance are Hippuric acid and Phenol. Both of them have consistency as all reports results are in the same trend. We can conclude that for Hippuric acid and Phenol have evidences to reject the null hypothesis.  

**Note**: compounds with both up-regulated and down-regulated tendencies points to low consistency. 

## Amanida report

All computations and functions discussed above can be obtained straightforward in an html document. From scratch `amanida_report` computes the meta-analysis and display the results graphically.

It needs the following parameters:

* file: path to the dataset where is the data to import it. It is the original file xlsx, xls, csv or txt. 
* separator: symbol used in the dataset to separate the different items, i.e. if you have a csv file you will use separator = ",". 
* analysis_type: parameter to choose which type of meta-analysis perform. 
  * "quan-qual": for quantitative and qualitative meta-analysis
  * "quan": for quantitative meta-analysis
  * "qual": for qualitative meta-analysis
* column_id: vector containing the name of columns to select from the original dataset. It has to be in order identification, p-values, fold-changes, sample size and reference.
* pvalue_cutoff: numerical value to indicate the threshold on p-values. Recommended 0.05. 
* fc_cutoff: numerical value to indicate the threshold on fold-change. Recommended to use values > 2.  
* votecount_lim: numerical value to indicate the threshold for qualitative graphics.  
* path: path to save the html created, otherwise the file will be saved in a temporal folder. 
* comp_inf: to include information about metabolites from public databases such as PubChem, HMDB or KEGG. 

```{r, eval = F}
column_id = c("Compound Name", "P-value", "Fold-change", "N total", "References")
input_file <- getsampleDB()

amanida_report(input_file, 
               separator = ";", 
               analysis_type = "quan-qual",
               column_id = column_id, 
               pvalue_cutoff = 0.05, 
               fc_cutoff = 4, 
               votecount_lim = 2,
               comp_inf = F)
```

The report will be saved in the working directory. 

## Session Info

```{r}
sessionInfo()
```