The R markdown is available from the pulldown menu for Code
at the upper-right, choose “Download Rmd”, or download
the Rmd from GitHub.
This vignette will show you how to map or translate identifiers from
one database (e.g., Ensembl) to another (e.g, Entrez Gene). This is a
common requirement for data analysis. In the context of Cytoscape, for
example, identifier mapping is needed when you want to import data to
overlay on a network but you don’t have matching keys. There are three
distinct examples below, highlighting different lessons that may apply
to your use cases.
Installation
if(!"RCy3" %in% installed.packages()){
install.packages("BiocManager")
BiocManager::install("RCy3")
}
library(RCy3)
Required Software
The whole point of RCy3 is to connect with Cytoscape. You will need
to install and launch Cytoscape:
Example: Species specific considerations
When planning to import data, you need to consider the key columns
you have in your network data and in your table data. It’s always
recommended that you use proper identifiers as your keys (e.g., from
databases like Ensembl and Uniprot-TrEMBL). Relying on conventional
symbols and names is not standard and error prone.
Let’s start with the sample network provided by Cytoscape.
Caution: Loading a session file will discard your current
session. Save first, if you have networks or data you want to keep. Use
saveSession(‘path_to_file’).
openSession() #Closes current session (without saving) and opens a sample session file
You should now see a network with just over 300 nodes. If you look at
the Node Table, you’ll see that there are proper identifiers in the
name columns, like “YDL194W”. These are the Ensembl-supported
IDs for Yeast.
Example: From proteins to genes
For this next example, you’ll need the STRING app to access the
STRING database from within Cytoscape: * Install the STRING app from https://apps.cytoscape.org/apps/stringapp
#available in Cytoscape 3.7.0 and above
installApp('STRINGapp')
Now we can import protein interaction networks with a ton of
annotations from the STRING database with a simple commandsGET function,
like this:
string.cmd = 'string disease query disease="breast cancer" cutoff=0.9 species="Homo sapiens" limit=150'
commandsGET(string.cmd)
# for more information on string commands:
# commandsHelp('string')
# commandsHelp('string disease query')
Check out the Node Table and you’ll see display names and
identifiers. In particular, the canonical name column appears
to hold Uniprot-TrEMBL IDs. Nice, we can use that!
Example: Mixed identifiers
From time to time, you’ll come across a case where the identifiers in
your network are of mixed types. This is a rare scenario, but here is
one approach to solving it.
First, you’ll need the WikiPathways app to access the WikiPathways
database. The pathways in WikiPathways are curated by a community of
interested researchers and citizen scientists. As such, there are times
where authors might use different sources of identifiers. They are valid
IDs, just not all from the same source. Future versions of the
WikiPathways app will provide pre-mapped columns to a single ID type.
But in the meantime (and relevant to other use cases), this
example highlights how to handle a source of mixed identifier
types.
#available in Cytoscape 3.7.0 and above
installApp('WikiPathways')
Now we can import an Apoptosis Pathway from WikiPathways. Either from
the web site (https://wikipathways.org), or from the Network Search
Tool in Cytoscape GUI or from the rWikiPathways package, we could
identify the pathway as WP254.
wp.cmd = 'wikipathways import-as-pathway id="WP254"'
commandsGET(wp.cmd)
# for more information on wikipathways commands:
# commandsHelp('wikipathways')
# commandsHelp('wikipathways import-as-pathway')
Take look in the XrefId column and you’ll see a mix of
identifier types. The next column over, XrefDatasource,
conveniently names each type’s source. Ignoring the metabolites for this
example, we just have a mix of Ensembl and Entrez Gene to deal with.
More advanced cases
This identifier mapping function is intended to handle the majority
of common ID mapping problems. It has limitation, however.
If you need an ID mapping solution for species or ID types not
covered by this tool, or if you want to connect to alternative sources
of mappings, then check out the BridgeDb app: http://apps.cytoscape.org/apps/bridgedb.
#available in Cytoscape 3.7.0 and above
installApp('BridgeDb')
And then browse the available function with
commandsHelp(‘bridgedb’)
