The R markdown is available from the pulldown menu for Code
at the upper-right, choose “Download Rmd”, or download
the Rmd from GitHub.
This vignette describes how to use data from an affinity
purification-mass spectrometry experiment to generate relevant
interaction networks, enriching the networks with information from
public resources, analyzing the networks and creating effective
visualizations.
The result of this vignette will be a visualization of a human-HIV
integrated network combining experimental data and publicly available
interaction data. This approach was use to make Figure 3 in this Jäger 2011
Nature paper.
Installation
if(!"RCy3" %in% installed.packages()){
install.packages("BiocManager")
BiocManager::install("RCy3")
}
library(RCy3)
Prerequisites
In addition to this package (RCy3), you will need:
installApp('stringApp')
installApp('enhancedGraphics')
Background
The data used for this protocol represents interactions between human
and HIV proteins by Jäger et al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3310911/).
In this quantitative AP-MS experiment, a relatively small number of bait
proteins are used to pull down a larger set of prey proteins.
Note that this tutorial does not describe how to pre-process the raw
AP-MS data, the data used here is already scored and filtered.
Import Network and Data
Let’s start by reading in the example data file:
apms.data<-read.csv(file="https://raw.githubusercontent.com/cytoscape/cytoscape-automation/master/for-scripters/R/notebooks/AP-MS/ap-ms-demodata.csv", stringsAsFactors = FALSE)
Now we can create a data frame for the network edges (interactions)
using the imported data. We will add an interaction “AP-MS” to each
edge, which will be useful later, and we can also add the AP-MS score
from the data as an edge attribute:
edges <- data.frame(source=apms.data[,"Bait"],target=apms.data[,"Prey"],interaction="AP-MS", AP.MS.Score=apms.data[,"AP.MS.Score"],stringsAsFactors=FALSE)
Finally, we use the edge data fram to create the network. Note that
we don’t need to define a data frame for nodes, as all nodes in this
case are represented in the edge data frame.
createNetworkFromDataFrames(edges=edges, title="apms network", collection = "apms collection")
The imported network consists of multiple smaller subnetworks, each
representing a bait node and its associated prey nodes
Loading Data
There are three other columns of data and annotations for the “Prey”
proteins that we want to load into this network.
In this data, the Prey nodes are repeated for each interactions with
a Bait node, so the data contains different values for the same
attribute (for example HEKScore), for each Prey node. During import, the
last value imported will overwrite prior values and visualizations using
this attribute thus only shows the last value.
loadTableData(apms.data[,2:5], data.key.column="Prey")
Augmenting Network with Existing Protein-protein Interaction
Data
We are going to use existing protein-protein interaction data to
enrich the network, using the STRING database with the human protein
nodes as input.
Let’s collect all the UniProt identifiers from the data, and create a
text string that we can use to query STRING. For this purpose, we are
going to specify the type of network as “physical subnetwork”, since we
are looking for protein complexes:
uniprot.str<-toString(apms.data[,"UniProt"])
string.cmd<-paste('string protein query',
'query="',uniprot.str,'"',
'species="Homo sapiens"',
'limit=0',
'cutoff=0.999', 'networkType="physical subnetwork"',
sep= ' ')
commandsRun(string.cmd)
The resulting network contains known interactions between the human
proteins, with an evidence score of 0.999 or greater.
Merge Networks
To incorporate the new information into our AP-MS network, we need
merge the STRING and AP-MS networks. We can use the Uniprot IDs
available in both networks as the matching attribute, “Uniprot” in the
AP-MS network, and “query term in the String network. We will also
specify how to merge the attribute containing the node name (symbol),
which is contained in the”name” attribute for the AP-MS network and the
“display name” for the String network.
merge.cmd<-paste('network merge',
'operation=union',
'sources="apms network,STRING network (physical)"',
'nodeKeys="Uniprot,query term"',
'nodeMergeMap="{name,display name,display name, string}"')
commandsRun(merge.cmd)
Network Visualization
When the merged network first loads, it will have the STRING style
applied. However, because the HIV nodes are not from STRING, they will
be grey. The layout also makes the network hard to interpret. Let’s
change the style of the network a bit.
First, let’s set our AP-MS network as the current network:
setCurrentNetwork('union: apms network,STRING network (physical)')
setCurrentView()
Next, we can define our style and apply it to the network:
style.name = "AP-MS"
createVisualStyle(style.name)
lockNodeDimensions(TRUE)
setNodeSizeDefault('50', style.name = style.name)
setNodeColorDefault("#CCCCFF", style.name=style.name)
setNodeLabelMapping('display name', style.name=style.name)
setVisualStyle(style.name=style.name)
layoutNetwork(paste('force-directed',
'defaultSpringCoefficient=0.00001',
'defaultSpringLength=50',
'defaultNodeMass=4',
sep=' '))
STRING Enrichment
Now that we have a merged network, we can do enrichment analysis on
it, using the STRING enrichment tool.
The STRING app has built-in enrichment analysis functionality, which
includes enrichment for GO Process, GO Component, GO Function, InterPro,
KEGG Pathways, and PFAM.
commandsRun('string make string network="current"')
string.cmd<-paste('string retrieve enrichment',
'allNetSpecies="Homo sapiens"')
commandsRun(string.cmd)
The STRING enrichment results don’t open by default if run from the
command interface.
commandsRun('string show enrichment')
The STRING app includes several options for filtering and displaying
the enrichment results. We will filter the results to only show GO
Process.
commandsRun(paste('string filter enrichment',
'categories="GO Biological Process"',
'removeOverlapping="true"',
sep = ' '))
Next, we will add a split donut chart.
commandsRun('string show charts')
Visualizing Results - Jurkat Score
We can create a visualization based on the Jurkat Score, and the
different interaction types (AP-MS and STRING):
style.name = "AP-MS Jurkat Score"
createVisualStyle(style.name)
setVisualStyle(style.name)
setNodeColorDefault("#FFCC00", style.name=style.name)
setNodeLabelMapping('display name', style.name=style.name)
setEdgeColorMapping("interaction", "AP-MS", "#55AA55", "d", style.name = style.name)
setEdgeLineWidthMapping('AP.MS.Score', table.column.values=c(0,1), widths=c(1,5), mapping.type="c", style.name=style.name)
Now, we define a color gradient based on the data values in the
JurkatScore
column:
setNodeColorMapping('JurkatScore', colors = paletteColorBrewerPurples, style.name=style.name)
We now have a visualization of the network highlighting the ap-ms
experimental data (green edges), as well as additional known
interactions (grey edges), with node color indicating the Jurkat Score
from the data.
Visualizing Results - Combined
We could create a similar kind of style for the HEK score, but that
only allows for viewing each style seperately. Instead, we can create a
combined style, using the enhancedGraphics app.
For this, we will need a new column defining a new attribute that
will be used for mapping to the Custom Graphics property via the
enhancedGraphics app. This new attribute has to be in the form of
mappings recognized by the enhancedGraphics app.
We can copy the previous style to retain some of the mappings we want
to keep:
copyVisualStyle(from.style="AP-MS Jurkat Score", to.style="AP-MS CombinedScore")
setVisualStyle(style.name="AP-MS CombinedScore")
To begin adding the new column, we first define a data frame with the
new attribute formatted for enhancedGraphics:
all.nodes<-getAllNodes()
combined.df<-data.frame(all.nodes, 'piechart: showlabels=false range="0,1" arcstart=90 valuelist=".5,.5" colorlist="up:blue,zero:white,down:white;up:purple,zero:white,down:white" attributelist="HEKScore,JurkatScore"')
colnames(combined.df)<-c("name","CombinedScore")
Next, we load this dataframe into the Node Table to create and fill a
new column:
loadTableData(combined.df, data.key.column = "name", table.key.column = "name")
We now have a new column, CombinedScore, that we can use for
the mapping. This mapping does not come with a custom helper function,
se we are going to use two alternative functions to prepare the
passthrough mapping property and then update our visual style with the
new mapping:
piechart.map<-mapVisualProperty('node customgraphics 4','CombinedScore','p')
updateStyleMapping('AP-MS CombinedScore', piechart.map)
Remember that when we imported multiple values for a single node
attribute, such as the scores for human nodes interacting with more than
one HIV nodes, the last value imported will overwrite prior values and
the visualization thus only shows the last value. For EIF3A,
which interacts with both PR and POL, only the data
relevant to the PR interaction is maintained in the Node Table
because our source data was sorted alphabetically by Bait.
Saving, sharing and publishing
Saving a Cytoscape session file
Session files save everything. As with most project
software, we recommend saving often!
saveSession('AP-MS_session') #.cys
Note: If you don’t specify a complete path, the
files will be saved relative to your current working directory in R.
Saving high resolution image files
You can export extremely high resolution images, including vector
graphic formats.
exportImage('AP-MS_image', type = 'PDF') #.pdf
?exportImage
---
title: "Affinity purification-mass spectrometry network analysis"
author: "Kristina Hanspers, Alexander Pico"
package: RCy3
date: "`r Sys.Date()`"
output:
  html_notebook:
    toc_float: true
    code_folding: "none"
#  pdf_document:
#    toc: true    
---
```{r echo=FALSE}
knitr::opts_chunk$set(
  eval=FALSE
)
```

*The R markdown is available from the pulldown menu for* Code *at the upper-right, choose "Download Rmd", or [download the Rmd from GitHub](https://raw.githubusercontent.com/cytoscape/cytoscape-automation/master/for-scripters/R/notebooks/AP-MS-network-analysis.Rmd).*

<hr />
This vignette describes how to use data from an affinity purification-mass spectrometry experiment to generate relevant interaction networks, enriching the networks with information from public resources, analyzing the networks and creating effective visualizations.

The result of this vignette will be a visualization of a human-HIV integrated network combining experimental data and publicly available interaction data. This approach was use to make Figure 3 in this [Jäger 2011 Nature paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3310911/).

<center>
![](https://cytoscape.github.io/cytoscape-tutorials/protocols/AP-MS-network-analysis/final-module.png){width=60%}
</center>
<hr />

# Installation
```{r}
if(!"RCy3" %in% installed.packages()){
    install.packages("BiocManager")
    BiocManager::install("RCy3")
}
library(RCy3)
```

# Prerequisites
In addition to this package (RCy3), you will need:

  * **Latest version of Cytoscape**, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.
* Complete installation wizard
* Launch Cytoscape 
For this vignette, you’ll also need the STRING app and the enhancedGraphics app: 

* Install the STRING app from https://apps.cytoscape.org/apps/stringapp
* Install the enhancedGraphics app from http://apps.cytoscape.org/apps/enhancedgraphics

```{r}
installApp('stringApp')
installApp('enhancedGraphics')
```

# Background
The data used for this protocol represents interactions between human and HIV proteins by Jäger et al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3310911/). In this quantitative AP-MS experiment, a relatively small number of bait proteins are used to pull down a larger set of prey proteins.

Note that this tutorial does not describe how to pre-process the raw AP-MS data, the data used here is already scored and filtered.

# Import Network and Data
Let's start by reading in the example data file:

```{r}
apms.data<-read.csv(file="https://raw.githubusercontent.com/cytoscape/cytoscape-automation/master/for-scripters/R/notebooks/AP-MS/ap-ms-demodata.csv", stringsAsFactors = FALSE)
```

Now we can create a data frame for the network edges (interactions) using the imported data. We will add an interaction "AP-MS" to each edge, which will be useful later, and we can also add the AP-MS score from the data as an edge attribute: 

```{r}
edges <- data.frame(source=apms.data[,"Bait"],target=apms.data[,"Prey"],interaction="AP-MS", AP.MS.Score=apms.data[,"AP.MS.Score"],stringsAsFactors=FALSE)
```

Finally, we use the edge data fram to create the network. Note that we don't need to define a data frame for nodes, as all nodes in this case are represented in the edge data frame.
```{r}
createNetworkFromDataFrames(edges=edges, title="apms network", collection = "apms collection")
```

The imported network consists of multiple smaller subnetworks, each representing a bait node and its associated prey nodes

# Loading Data
There are three other columns of data and annotations for the "Prey" proteins that we want to load into this network. 

In this data, the Prey nodes are repeated for each interactions with a Bait node, so the data contains different values for the same attribute (for example HEKScore), for each Prey node. During import, the last value imported will overwrite prior values and visualizations using this attribute thus only shows the last value.

```{r}
loadTableData(apms.data[,2:5], data.key.column="Prey")
```

# Augmenting Network with Existing Protein-protein Interaction Data 
We are going to use existing protein-protein interaction data to enrich the network, using the STRING database with the human protein nodes as input.

Let's collect all the UniProt identifiers from the data, and create a text string that we can use to query STRING. For this purpose, we are going to specify the type of network as "physical subnetwork", since we are looking for protein complexes:

```{r}
uniprot.str<-toString(apms.data[,"UniProt"])
string.cmd<-paste('string protein query',
                  'query="',uniprot.str,'"', 
                  'species="Homo sapiens"', 
                  'limit=0', 
                  'cutoff=0.999', 'networkType="physical subnetwork"',
                  sep= ' ')
commandsRun(string.cmd)
```

The resulting network contains known interactions between the human proteins, with an evidence score of 0.999 or greater.

# Merge Networks
To incorporate the new information into our AP-MS network, we need merge the STRING and AP-MS networks. We can use the Uniprot IDs available in both networks as the matching attribute, "Uniprot" in the AP-MS network, and "query term in the String network.
We will also specify how to merge the attribute containing the node name (symbol), which is contained in the "name" attribute for the AP-MS network and the "display name" for the String network. 

```{r}
merge.cmd<-paste('network merge',
                 'operation=union',
                 'sources="apms network,STRING network (physical)"',
                 'nodeKeys="Uniprot,query term"', 
                 'nodeMergeMap="{name,display name,display name, string}"')
commandsRun(merge.cmd)
```

# Network Visualization
When the merged network first loads, it will have the STRING style applied. However, because the HIV nodes are not from STRING, they will be grey. The layout also makes the network hard to interpret. Let's change the style of the network a bit.

First, let's set our AP-MS network as the current network:
```{r}
setCurrentNetwork('union: apms network,STRING network (physical)')
setCurrentView()
```

Next, we can define our style and apply it to the network:
```{r}
style.name = "AP-MS"
createVisualStyle(style.name)
lockNodeDimensions(TRUE)
setNodeSizeDefault('50', style.name = style.name)
setNodeColorDefault("#CCCCFF", style.name=style.name)
setNodeLabelMapping('display name', style.name=style.name)
setVisualStyle(style.name=style.name)
```

```{r}
layoutNetwork(paste('force-directed', 
              'defaultSpringCoefficient=0.00001',
              'defaultSpringLength=50',
              'defaultNodeMass=4',
              sep=' '))
```

# STRING Enrichment
Now that we have a merged network, we can do enrichment analysis on it, using the STRING enrichment tool.

The STRING app has built-in enrichment analysis functionality, which includes enrichment for GO Process, GO Component, GO Function, InterPro, KEGG Pathways, and PFAM.

```{r}
commandsRun('string make string network="current"')

string.cmd<-paste('string retrieve enrichment', 
                  'allNetSpecies="Homo sapiens"')
commandsRun(string.cmd)
```

The STRING enrichment results don't open by default if run from the command interface. 
```{r}
commandsRun('string show enrichment')
```

The STRING app includes several options for filtering and displaying the enrichment results. We will filter the results to only show GO Process.

```{r}
commandsRun(paste('string filter enrichment', 
                  'categories="GO Biological Process"',
                  'removeOverlapping="true"',
                  sep = ' '))
```

Next, we will add a split donut chart.

```{r}
commandsRun('string show charts')
```

# Visualizing Results - Jurkat Score
We can create a visualization based on the Jurkat Score, and the different interaction types (AP-MS and STRING):

```{r}
style.name = "AP-MS Jurkat Score"
createVisualStyle(style.name)
setVisualStyle(style.name)
setNodeColorDefault("#FFCC00", style.name=style.name)
setNodeLabelMapping('display name', style.name=style.name)
setEdgeColorMapping("interaction", "AP-MS", "#55AA55", "d", style.name = style.name)
setEdgeLineWidthMapping('AP.MS.Score', table.column.values=c(0,1), widths=c(1,5), mapping.type="c", style.name=style.name)
```

Now, we define a color gradient based on the data values in the `JurkatScore` column:

```{r}
setNodeColorMapping('JurkatScore', colors = paletteColorBrewerPurples, style.name=style.name)
```

We now have a visualization of the network highlighting the ap-ms experimental data (green edges), as well as additional known interactions (grey edges), with node color indicating the Jurkat Score from the data.

# Visualizing Results - Combined
We could create a similar kind of style for the HEK score, but that only allows for viewing each style seperately. Instead, we can create a combined style, using the enhancedGraphics app.

For this, we will need a new column defining a new attribute that will be used for mapping to the Custom Graphics property via the enhancedGraphics app. This new attribute has to be in the form of mappings recognized by the enhancedGraphics app.

We can copy the previous style to retain some of the mappings we want to keep:

```{r}
copyVisualStyle(from.style="AP-MS Jurkat Score", to.style="AP-MS CombinedScore")
setVisualStyle(style.name="AP-MS CombinedScore")
```

To begin adding the new column, we first define a data frame with the new attribute formatted for enhancedGraphics:

```{r}
all.nodes<-getAllNodes()
combined.df<-data.frame(all.nodes, 'piechart: showlabels=false range="0,1" arcstart=90 valuelist=".5,.5" colorlist="up:blue,zero:white,down:white;up:purple,zero:white,down:white" attributelist="HEKScore,JurkatScore"')
colnames(combined.df)<-c("name","CombinedScore")
```

Next, we load this dataframe into the Node Table to create and fill a new column:

```{r}
loadTableData(combined.df, data.key.column = "name", table.key.column = "name")
```

We now have a new column, *CombinedScore*, that we can use for the mapping. This mapping does not come with a custom helper function, se we are going to use two alternative functions to prepare the passthrough mapping property and then update our visual style with the new mapping:

```{r}
piechart.map<-mapVisualProperty('node customgraphics 4','CombinedScore','p')
updateStyleMapping('AP-MS CombinedScore', piechart.map)
```

Remember that when we imported multiple values for a single node attribute, such as the scores for human nodes interacting with more than one HIV nodes, the last value imported will overwrite prior values and the visualization thus only shows the last value. For <b>EIF3A</b>, which interacts with both <b>PR</b> and <b>POL</b>, only the data relevant to the <b>PR</b> interaction is maintained in the Node Table because our source data was sorted alphabetically by Bait.

# Saving, sharing and publishing

## Saving a Cytoscape session file
Session files save *everything*. As with most project software, we recommend saving often!
```{r}
saveSession('AP-MS_session') #.cys
```
**Note:** If you don't specify a complete path, the files will be saved relative to your current working directory in R. 

## Saving high resolution image files
You can export extremely high resolution images, including vector graphic formats.
```{r}
exportImage('AP-MS_image', type = 'PDF') #.pdf
?exportImage
```
