This document is licensed under the Creative Commons license, 2006
Authors: The Cytoscape Collaboration
The Cytoscape project is an ongoing collaboration between:
Visit http://www.cytoscape.org for more information.
Cytoscape is protected under the GNU LGPL (Lesser General Public License). The License is included as an appendix to this manual, but can also be found online: http://www.gnu.org/copyleft/lesser.txt. Cytoscape also includes a number of other open source libraries, which are detailed in theCytoscape User Manual/Acknowledgements below.
Web Service client plugins for downloading networks from PathwayCommons, IntAct, and NCBI Entrez Gene.
Annotation import web service plugin for BioMart. This is mainly for ID translation/synonym mapping.
In general, Java SE 6 is faster than 5. If your machine is compatible with the 6 series, please try version 6.
There are a number of options for downloading and installing Cytoscape. All options can be downloaded from the http://cytoscape.org website.
You can check out the latest and greatest software from our Subversion repository.
Cytoscape installations (regardless of platform) containing the following files and directories:
Table 3.
File | Description |
cytoscape.jar | Main Cytoscape application (Java archive) |
cytoscape.sh | Script to run Cytoscape from command line (Linux, Mac OS X) |
cytoscape.bat | Script to run Cytoscape (Windows) |
LICENSE.txt/html | Cytoscape GNU LGPL License |
lib/ | library jar files needed to run Cytoscape. |
docs/ | Manuals in different formats. What you are reading now. |
licenses/ | Licence files for the various libraries distributed with Cytoscape. |
plugins/ | Directory containing cytoscape plugins, in .jar format. |
sampleData/ | |
| galFiltered.gml -- Sample molecular interaction network file * |
| galFiltered.sif -- Identical network in Simple Interaction Format * |
| galExpData.pvals -- Sample gene expression matrix file * |
| galFiltered.nodeAttrTable.xls -- Sample node attribute file in Microsoft Excel format |
| galFiltered.cys -- Sample session file created from datasets above plus annotations from several databases * |
| BINDyeast.sif -- Network of all yeast protein-protein interactions in the BIND database as of Dec, 2006 ** |
| BINDhuman.sif -- Network of all human protein-protein interactions in the BIND database as of Dec, 2006 ** |
| yeastHighQuality.sif -- Sample molecular interaction network file *** |
| interactome_merged.networkTable.gz -- Human interactome network file in tab-delimited format **** |
| sampleStyles.props -- Additional sample Visual Styles |
* From Ideker et al., Science 292:929 (2001)
** Obtained from data hosted at http://www.blueprint.org/bind/bind_downloads.html
** From von Mering et al., Nature, 417:399 (2002) and Lee et al, Science 298:799 (2002)
**** Created from Cytoscape tutorial web page. Original data sets are available at: http://www.cytoscape.orghttp://cytoscape.org/cgi-bin/moin.cgi/Data_Sets/ from "A merged human interactome" by Andrew Garrow, Yeyejide Adeleye and Guy Warner (Unilever, Safety and Environmental Assurance Center).
Double-click on the icon created by the installer or by running cytoscape.sh
from the command line (Linux or Mac OS X) or double-clicking cytoscape.bat
(Windows). Alternatively, you can pass the .jar file to Java directly using the command java -Xmx512M -jar cytoscape.jar -p plugins
. The -Xmx512M
flag tells java to allocate more memory for Cytoscape and the -p plugins
option tells cytoscape to load all of the plugins in the plugins directory. Loading the plugins is important because many key features like layouts, filters and the attribute browser are included with Cytoscape as plugins in the plugins
directory. See the Command Line chapter for more detail. In Windows, it is also possible to directly double-click the .jar file to launch it. However, this does not allow specification of command-line arguments (such as the location of the plugin directory).
When you succeed in launching Cytoscape, a window will appear that looks like this (captured on Mac OS 10.4):
Suggested Memory Size Without View
Table 4.
Number of Objects (nodes + edges) | Suggested Memory Size |
0 - 70,000 | 512M (default) |
70,000 - 150,000 | 800M |
Suggested Memory Size With View
When a network is loaded, Cytoscape will look something like the image below:
The main window here has several components:
usage: java -Xmx512M -jar cytoscape.jar [OPTIONS] -h,--help Print this message. -v,--version Print the version number. -s,--session <file> Load a cytoscape session (.cys) file. -N,--network <file> Load a network file (any format). -e,--edge-attrs <file> Load an edge attributes file (edge attribute format). -n,--node-attrs <file> Load a node attributes file (node attribute format). -m,--matrix <file> Load a node attribute matrix file (table). -p,--plugin <file> Load a plugin jar file, directory of jar files, plugin class name, or plugin jar URL. -P,--props <file> Load cytoscape properties file (Java properties format) or individual property: -P name=value. -V,--vizmap <file> Load vizmap properties file (Java properties format).
Table 8.
Important! If you have used previous versions of Cytoscape, you will notice that handling of properties has changed. The most important change is that properties are no longer saved by default to the current directory or to your home |
The Cytoscape Properties editor, accessed via Edit → Preferences → Properties…, is used to specify general and default properties. Properties are now stored in Cytoscape session files, so changes to general properties will be saved as part of the current session, but will only carry over to subsequent sessions if they are set as defaults or exported using the File → Export function.
Cytoscape properties are configurable via Add, Modify and Delete operations.
Some common properties are described below.
Table 9.
Property name | Default value | Valid values |
viewThreshold | 10000 | integer > 0 |
secondaryViewThreshold | 30000 | integer > 0 |
viewType | tabbed | tabbed |
defaultWebBrowser |
| A path to the web browser on your system. This only needs to be specified if Cytoscape can’t find the web browser on your system. |
There are 4 different ways of creating networks in Cytoscape:
Network files can be specified in any of the formats described in the Supported Network Formats chapter. Networks are imported into Cytoscape through the "Import Network" window, which can be accessed by going to File → Import → Network (multiple file types). The network file can either be located directly on the local computer, or found on a remote computer (in which case it will be referenced with a URL).
The Import Networks dialog is also capable of importing network files using a URL. To do this, set the Data Source Type to Remote and insert the appropriate URL, either manually or using URL bookmarks. Bookmarked URLs can be accessed by clicking on the arrow to the right of the text field (see the Bookmark Manager in Preferences for more details on bookmarks). Also, you can drag and drop links from web browser to the URL text box. Once a URL has been specified, click on the Import button to load the network.
Importing networks from URL addresses has an important caveat. Because Cytoscape determines file type primarily (not exclusively) by file extension, it can have trouble importing networks with URLs that don't end in a human readable file name. If Cytoscape does not recognize a meaningful file name and extension in the URL, it will attempt to guess the type of file based on MIME type. If the MIME type is not recognizable to any of our import handlers, then the import will fail.
Another issue for network import is the presence of firewalls, which can affect which files are accessible to a computer. To work around this problem, Cytoscape supports the use of proxy servers. To configure the proxy server, go to Edit → Preferences→ Proxy Server... . This is further described in the Preferences chapter.
source target interaction boolean attribute string attribute floating point attribute YJR022W YNR053C pp TRUE abcd12371 1.2344543 YER116C YDL013W pp TRUE abcd12372 1.2344543 YNL307C YAL038W pp FALSE abcd12373 1.2344543 YNL216W YCR012W pd TRUE abcd12374 1.2344543 YNL216W YGR254W pd TRUE abcd12375 1.2344543
YJR022W YNR053C YER116C YDL013W YNL307C YAL038W YNL216W YCR012W YNL216W YGR254W
Unique ID A Unique ID B Alternative ID A Alternative ID B Aliases A Aliases B Interaction detection methods First author surnames Pubmed IDs species A species B Interactor types Source database Interaction ID Interaction labels Cross-references Associated Files Experiment files Experiment labels Different techniques Different Pubmed articles Different sources Weight 7205 5747 TRIP6 PTK2 Q15654 Q05397-1 vv|HPRD Currently not available 14688263|15892868(Marcotte) Mammalia Homo sapiens protein|protein HPRD|Marcotte 0 Thyroid hormone receptor interactor 6-FAK-|PTK2-TRIP6 NA(HPRD)|NA(Marcotte) HPRD/02859_psimi.xml|other/ORIGINAL_DATA_MARCOTTE.txt vv(HPRD/02859_psimi.xml)|HPRD(other/ORIGINAL_DATA_MARCOTTE.txt) 17651(ExptRef)|Marcotte 2 2 2 2 4174 7311 MCM5 UBA52 P33992 P62987 neighbouring_reaction Currently not available 15608231(Reactome) Homo sapiens Homo sapiens protein|protein Reactome 1 P33992-P62988 Reaction:68944<->Reaction:68946(Reactome)|Reaction:68946<->Reaction:68944(Reactome) other/ORIGINAL_DATA_MARCOTTE.txt neighbouring_reaction(other/REACTOMEhomo_sapiens.interactions.txt) Reactome 1 1 1 1 7040 7040 TGFB1 TGFB1 P01137 P01137 nmr: nuclear magnetic resonance Currently not available 8679613 Homo sapiens Homo sapiens protein|protein BIND 2 TGFB1-TGFB1- 72085(BIND) BIND/bind_taxid9606.1.psi.xml nmr: nuclear magnetic resonance(BIND/bind_taxid9606.1.psi.xml) NotAvailable 1 1 1 1
The network import function cannot import node attributes - only edge attributes. To import node attributes from this table, please see the Attributes section of this manual.
Note (1): This data is taken from the A merged human interactome datasets by Andrew Garrow, Yeyejide Adeleye and Guy Warner (Unilever, Safety and Environmental Assurance Center, 12 October 2006). Actual data files are available at http://www.cytoscape.orghttp://cytoscape.org/cgi-bin/moin.cgi/Data_Sets/
To import network text/Excel tables, please follow these steps:
Select File → Import → Network from Table (Text/MS Excel)...
Enable/Disable Attribute Column - By left-clicking on a column header in the preview table, you can enable/disable edge attributes. If the header is checked and entries are blue, the column will be imported as an edge attribute. For example, the table below shows that columns 1 through 3 will be used as network data, column 4 will not be imported, and columns 5 and 6 will be imported as edge attributes.
Change Attribute Name and Data Types - If you right-click on a column header in the preview table, you can modify the attribute name and data type. For more detail, see "Modify Attribute Name/Type" below.
Table Import feature supports list of nodes without edges. If you select source column only, it creates a network without interactions. This feature is usufl with node expansion function available from some web service clients. Please read the section Importing Networks from External Database for more detail.
You can select several options by checking the Show Text File Import Options checkbox.
Cytoscape can read network/pathway files written in the following formats:
nodeA <relationship type> nodeB nodeC <relationship type> nodeA nodeD <relationship type> nodeE nodeF nodeB nodeG ... nodeY <relationship type> nodeZ
node1 typeA node2 node2 typeB node3 node4 node5 node0
node1 xx node2 node1 xx node2 node1 yy node2
Edges connecting a node to itself (self-edges) are also allowed:
node1 xx node1
Every node and edge in Cytoscape has an identifying name, most commonly used with the node and edge data attribute structures. Node names must be unique, as identically named nodes will be treated as identical nodes. The name of each node will be the name in this file by default (unless another string is mapped to display on the node using the visual mapper). This is discussed in the section on visual styles. The name of each edge will be formed from the name of the source and target nodes plus the interaction type: for example, sourceName (edgeType) targetName
.
The tag <relationship type> can be any string. Whole words or concatenated words may be used to define types of relationships, e.g. geneFusion, cogInference, pullsDown, activates, degrades, inactivates, inhibits, phosphorylates, upRegulates, etc.
Some common interaction types used in the Systems Biology community are as follows:
pp .................. protein – protein interaction pd .................. protein -> DNA (e.g. transcription factor binding upstream of a regulating gene.)
Some less common interaction types used are:
pr .................. protein -> reaction rc .................. reaction -> compound cr .................. compound -> reaction gl .................. genetic lethal relationship pm .................. protein-metabolite interaction mp .................. metabolite-protein interaction
http://www.infosun.fmi.uni-passau.de/Graphlet/GML/
It is generally not necessary to modify the content of a GML file directly. Once a network is built in SIF format and then laid out, the layout is preserved by saving to and loading from GML. Visual attributes specified in a GML file will result in a new visual style named Filename.style
when that GML file is loaded.
http://www.cs.rpi.edu/~puninj/XGMML/
XGMML is now preferred to GML because it offers the flexibility associated with all XML document types. If you're unsure about which to use, choose XGMML.
Cytoscape has native support for Microsoft Excel files (.xls) and delimited text files. The tables in these files can have network data and edge attributes. Users can specify columns containg source nodes, target nodes, interaction types, and edge attributes during file import. Some of the other network analysis tools, such as igraph (http://cneurocvs.rmki.kfki.hu/igraph/), has feature to export graph as simple text files. Cytoscape can read these text files and build networks from them. For more detail, please read the Import Free-Format Tables section section of the Creating Networks chapter.
Network generated by igraph's Watts-Strogatz small-world model (50k nodes and 250k esges) visualized by Cytoscape: You can import networks created by other applications using this Table Import feature.
To handle data sources with different sets of names, as is usually the case when manually integrating gene information from different sources, Cytoscape needs a data server that provides synonym information (see the chapter on Annotation). A synonym table gives a canonical name for each object in a given organism and one or more recognized synonyms for that object. Note that the synonym table itself defines which set of names are the "canonical" names. For example, in budding yeast, the ORF names are commonly used as the canonical names.
If a synonym server is available, then by default Cytoscape will convert every name that appears in a data file to the associated canonical name. Unrecognized names will not be changed. This conversion of names to a common set allows Cytoscape to connect the genes present in different data sources, even if they have different names – as long as those names are recognized by the synonym server.
For this to work, Cytoscape must also be provided with the species to which the objects belong, since the data server requires the species in order to uniquely identify the object referred to by a particular name. This is usually done in Cytoscape by specifying the species name on the command line with the –P option (cytoscape.sh -P "defaultSpeciesName=Saccharomyces cerevisiae"
) or by editing the properties (under Edit → Preferences → Properties...).
The automatic canonicalization of names can be turned off using the -P option (cytoscape.sh -P canonicalizeName=false"
) or by editing the properties (under Edit → Preferences → Properties...). This canonicalization of names currently does not apply to expression data. Expression data should use the same names as the other data sources or use the canonical names as defined by the synonym table.
Interaction networks are useful as stand-alone models. However, they are most powerful for answering scientific questions when integrated with additional information. Cytoscape allows the user to add arbitrary node, edge and network information to Cytoscape as node/edge/network attributes. This could include, for example, annotation data on a gene or confidence values in a protein-protein interaction. These attributes can then be visualized in a user-defined way by setting up a mapping from data attributes to visual attributes (colors, shapes, and so on). The section on visual styles discusses this in greater detail.
FunctionalCategory YAL001C = metabolism YAR002W = apoptosis YBL007C = ribosome
InteractionStrength YAL001C (pp) YBR043W = 0.82 YMR022W (pd) YDL112C = 0.441 YDL112C (pd) YMR022W = 0.9013
Note: In order to import network attributes in Cytoscape 2.4, please go to File → Import → Attribute from Table (text/MS Excel)... or encode them in an XGMML network file (see Supported File Formats for more details).
attributeName (class=formal.class.of.value)
floatingPointAttribute firstName = 1 secondName = 2.5
floatingPointAttribute (class=Double) firstName = 1 secondName = 2.5
floatingPointAttribute firstName = 1.0 secondName = 2.5
Edge names are all of the form:
sourceName (edgeType) targetName
Table 10.
sourceName space openParen edgeType closeParen space targetName |
Note that tabs are not allowed in edge names. Tabs can be used to separate the edge name from the "=" delimiter, but not within the edge name itself. Also note that this format is different from the specification of interactions in the SIF file format. To be explicit: a SIF entry for the previous interaction would look like
sourceName edgeType targetName
or
Table 11.
sourceName whiteSpace edgeType whiteSpace targetName |
To specify lists of values, use the following syntax:
listAttributeName (class=java.lang.String) firstObjectName = (firstValue::secondValue::thirdValue) secondObjectName = (onlyOneValue)
This example shows an attribute whose value is defined as a list of text strings. The first object has three strings, and thus three elements in its list, while the second object has a list with only one element. In the case of a list every attribute value uses list syntax (i.e. parentheses), and each element is of the same class. Again, the class will be inferred if it is not specified in the header line. Lists are not supported by the visual mapper and so can’t be mapped to visual attributes.
Table 12.
Object Key | Alias | SGD ID |
AAC3 | YBR085W|ANC3 | S000000289 |
AAT2 | YLR027C|ASP5 | S000004017 |
BIK1 | YCL029C|ARM5|PAC14 | S000000534 |
The attribute table file should contain a primary key column and at least one attribute column. The maximum number of attribute columns is unlimited. The Alias column is an optional feature, as is using the first row of data as attribute names. Alternatively, you can specify each attribute name from the File → Import → Attribute from Table (text/MS Excel)... user interface.
Select File → Import → Attribute from Table (text/MS Excel)...
Identifier [CommonName] value1 value2 ... valueN [pval1 pval2 ... pvalN]
Brackets [ ] indicate fields that are optional.
The next field is an optional common name. It is not used by Cytoscape, and is provided strictly for the user's convenience. With this common name field, the input format is the same as for commonly-used expression data anaysis packages such as SAM (http://www-stat.stanford.edu/~tibs/SAM/).
The next set of columns represent expression values, one per experiment. These can be either absolute expression values or fold change ratios. Each experiment is identified by its experiment name, given in the first line.
Optionally, significance measures such as P values may be provided. These values, generated by many microarray data analysis packages, indicate where the level of gene expression or the fold change appears to be greater than random chance. If you are using significance measures, then your expression file should contain them in a second set of columns after the expression values. The column names for the expression significance measures need to match those of the expression values exactly.
For example, here is an excerpt from the file galExpData.pvals in the Cytoscape sampleData directory:
GENE COMMON gal1RG gal4RG gal80R gal1RG gal4RG gal80R YHR051W COX6 -0.034 0.111 -0.304 3.75720e-01 1.56240e-02 7.91340e-06 YHR124W NDT80 -0.090 0.007 -0.348 2.71460e-01 9.64330e-01 3.44760e-01 YKL181W PRS1 -0.167 -0.233 0.112 6.27120e-03 7.89400e-04 1.44060e-01 YGR072W UPF3 0.245 -0.471 0.787 4.10450e-04 7.51780e-04 1.37130e-05
This indicates that there is data for three experiments: gal1RG, gal4RG, and gal80R. These names appear two times in the header line: the first time gives the expression values, and the second gives the significance measures. For instance, the second line tells us that in Experiment gal1RG, the gene YHR051W has an expression value of -0.034 with significance measure 3.75720e-01.
Some variations on this basic format are recognized; see the formal file format specification below for more information. Expression data files commonly have the file extensions ".mrna" or ".pvals", and these file extensions are recognized by Cytoscape when browsing for data files.
For the sample network file sampleData/galFiltered.sif
:
GENE COMMON gal1RG gal4RG gal80R gal1RG gal4RG gal80R YHR051W COX6 -0.034 0.111 -0.304 3.75720e-01 1.56240e-02 7.91340e-06 YHR124W NDT80 -0.090 0.007 -0.348 2.71460e-01 9.64330e-01 3.44760e-01 YKL181W PRS1 -0.167 -0.233 0.112 6.27120e-03 7.89400e-04 1.44060e-01
Probeset YHR051W = probeset2 YHR124W = probeset3 YKL181W = probeset4
GENE COMMON gal1RG gal4RG gal80R gal1RG gal4RG gal80R probeset2 COX6 -0.034 0.111 -0.304 3.75720e-01 1.56240e-02 7.91340e-06 probeset3 NDT80 -0.090 0.007 -0.348 2.71460e-01 9.64330e-01 3.44760e-01 probeset4 PRS1 -0.167 -0.233 0.112 6.27120e-03 7.89400e-04 1.44060e-01
The first line is a header line with one of the following three header formats:
<text> <text> cond1 cond2 ... cond1 cond2 ... [NumSigConds] <text> <text> cond1 cond2 ... <tab><tab>RATIOS<tab><tab>...LAMBDAS
Each line after the first is a data line with the following format:
FormalGeneName CommonGeneName ratio1 ratio2 ... [lambda1 lambda2 ...] [numSigConds]
Optionally, the last line of the file may be a special footer line with the following format:
NumSigGenes int1 int2 ...
This enables developers to write a program to access these services. Cytoscape core developer team have developed several sample web service clients using this framework. Currently, Cytoscape supports the following web services:
IntAct: an open source database of protein interaction data, hosted at EMBL-EBI.
Pathway Commons: an open source portal, providing access to multiple integrated data sets, including: Reactome, IntAct, HPRD, HumanCyc, MINT, the MSKCC Cancer Cell Map, and the NCI/Nature Pathway Interaction database.
NCBI Entrez Gene: a public database of genes, including annotation, sequence and interactions.
Biomart: an open source biological database engine. Useful for ID/Name mapping.
All of these clients are available as Plugins and users can install them through Plugin Manager.
In the following sections, users learn how to import network from extrenal databases.
To get started, select: File → Import → Network from web services...
Then, follow the three-step process outlined below:
You can configure access options from the Options tab. There are two retrieval options:
Simplified Binary Model: Retrieve a simplified binary network, as inferred from the original BioPAX representation. In this representation, nodes within a network refer to physical entities only, and edges refer to inferred interactions.
Full Model: Retrieve the full model, as stored in the original BioPAX representation. In this representation, nodes within a network can refer to physical entities and interactions.
Some of the web service clients can import attributes from external databases. BioMart client is an example. You can install it from Plugin Manager.
Load a network. In this example, we use galFiltered.sif in sampleData directory.
Select Data Source. Since galFiltered.sif is a yeast network, select yeast dataset.
For Key Attribute section, select ID for Attribute and Data Type should be Ensemble Gene ID. Attribute is the list of available attributes in current Cytoscape session and Data Type is the type of ID set of the attribute. In this case, Cytoscape uses ID as the key for mapping. Because the sample network galFiletred.sif uses Ensemble Gene ID for its node ID, like YOR072W, you need to select Ensemble Gene ID for Data Type. So you need to know the type of ID set (Entrez Gene ID, UniProt Unified Acc. Number, Ensemble Gene ID, etc.) of the attribute selected in the Attribute box.
Select attributes you want to import. (Note: You cannot select too many attributes at once because BioMart server has maximum number of selectable annotations.)
Press Import.
Now you can see the newly imported attributes on the Attribute Browser. You may see some attribute names ends with -TOP if there are multiple attribute values for a node. This is an attribute taken from the first entry of the original list attribute.
Web services are useful when you combine the result from multiple data sources.
Import network from IntAct using keyword. In this example, type p53 AND species:mouse.
Import human orthologs from BioMart.
Show the othologs as the list of Ensemble Gene ID on the Data Panel. Copy them and use them as the query for IntAct.
Import Entrez Gene ID from BioMart. Use ensembl attribute for the mapping key.
Import annotations from NCBI. The resulting networks looks like the following:
Cytoscape uses a Zoomable User Interface for navigating and viewing networks. ZUIs use two mechanisms for navigation: zooming and panning. Zooming increases or decreases the magnification of a view based on how much or how little a user wants to see. Panning allows users to move the focus of a screen to different parts of a view.
yFiles layouts are a set of commercial layouts which are provided courtesy of yWorks. Due to license restrictions, the detailed parameters for these layouts are not available (there are no yFiles entries in the Settings... dialog). The main layout algorithms provided by yFiles are:
The force-directed layout is a new layout based on the "force-directed" paradigm. This layout is based on the algorithm implemented as part of the excellent prefuse toolkit provided by Jeff Heer. The algorithm is very fast and with the right parameters can provide a very pleasing layout. The Force Directed Layout will also accept a numeric edge attribute to use as a weight for the length of the spring, although this will often require more use of the Settings... dialog to achieve the best layout. This algorithm is available by selecting Layout → Cytoscape Layouts → Force-Directed Layout → (unweighted) or the edge attribute you want to use as a weight. A sample screen shot showing a portion of the galFiltered network provided in sample data is provided below:
Several other alignment algorithms, including a selection from the JGraph project (http://jgraph.sourceforge.net), are also available under the Layout menu.
Table 14.
Button
Before
After
Description of Align Options


Vertical Align Top - The tops of the selected nodes are aligned with the top-most node.


Vertical Align Center - The centers of the selected nodes are aligned along a line defined by the midpoint between the top and bottom-most nodes.


Vertical Align Bottom - The bottoms of the selected nodes are aligned with the bottom-most node.


Horizontal Align Left - The left hand sides of the selected nodes are aligned with the left-most node.


Horizontal Align Center - The centers of the selected nodes are aligned along a line defined by the midpoint between the left and right-most nodes.


Horizontal Align Right - The right hand sides of the selected nodes are aligned with the right-most node.
Table 15.
Button
Before
After
Description of Distribute Options


Vertical Distribute Top - The tops of the selected nodes are distributed evenly between the top-most and bottom-most nodes, which should stay stationary.


Vertical Distribute Center - The centers of the selected nodes are distributed evenly between the top-most and bottom-most nodes, which should stay stationary.


Vertical Distribute Bottom - The bottoms of the selected nodes are distributed evenly between the top-most and bottom-most nodes, which should stay stationary.


Horizontal Distribute Left - The left hand sides of the selected nodes are distributed evenly between the left-most and right-most nodes, which should stay stationary.


Horizontal Distribute Center - The centers of the selected nodes are distributed evenly between the left-most and right-most nodes, which should stay stationary.


Horizontal Distribute Right - The right hand sides of the selected nodes are distributed evenly between the left-most and right-most nodes, which should stay stationary.
Table 16.
Button
Before
After
Description of Stack Options


Vertical Stack Left - Vertically stacked below top-most node with the left-hand sides of the selected nodes aligned.


Vertical Stack Center - Vertically stacked below top-most node with the centers of selected nodes aligned.


Vertical Stack Right - Vertically stacked below top-most node with the right-hand sides of the selected nodes aligned.


Horizontal Stack Top - Horizontally stacked to the right of the left-most node with the tops of the selected nodes aligned.


Horizontal Stack Center - Horizontally stacked to the right of the left-most node with the centers of selected nodes aligned.


Horizontal Stack Bottom - Horizontal Stack Center - Horizontally stacked to the right of the left-most node with the bottoms of the selected nodes aligned.
- Specify a default color and shape for all nodes.
- Use specific line types to indicate different types of interactions.
- Encode specific physical entities as different node shapes.
- Set node sizes based on the degree of connectivity of the nodes. You can visually see the hub of a network...
- ...or, set the font size of the node labels instead.
- Set node widths and heights based on label size.
- Visualize gene expression data along a color gradient.
- Control edge transparency (opacity) using edge weights.
- Control multiple edge visual properties using edge score.
- Browse extremely-dense networks by controlling the opacity of nodes.
- Show module locations in a large network.
- Overlay a subnetwork on huge interactome using opacity and color control.
- Main Panel
- This panel allows you to create/delete/view/switch between different visual styles using the Current Visual Style options. The Visual Mapping Browser at the bottom displays the mapping details for a given visual style and is used to edit these details as well.
- Default Appearance Editor
- Clicking on the section labelled "Defaults" on the Main Panel will bring up this editor, which allows users to visually edit the default appearance of nodes and edges for the selected visual style.
- Continuous Editors
- These are editors for continuous mapping, which is a mapping from numerical value to visual attributes. They are accessed through the Visual Mapping Browser on the Main Panel. Using these windows, users can edit continuous mapping more intuitively.
- Color Gradient Editor
- Continuous-to-Discrete Editor
- Continuous-to-Continuous Editor
Step 2. Switch between different Visual Styles
Finally, if you select Solid, you can see the graphics below:
Additional sample styles are available as sampleStyles.props file in the SampleData directory. You can import the sample file from File → Import → Vizmap Property File.
The Cytoscape VizMapper uses three core concepts:
Table 18.
Visual Attributes Associated with Nodes
Node Color
Node Opacity
Node Border Color
Node Border Opacity
Node Border Line Style. Solid and dashed lines are supported.
Node Border Line Width
Node Shape. The following options are available:
Node Size: the width and height of each node.
Node Label: the text label for each node.
Node Label Color
Node Label Opacity
Node Label Position: the position of the label relative to the node.
Node Font: node label font and size.
Table 19.
Visual Attributes Associated with Edges
Edge Color
Edge Opacity
Edge Line Style. Solid or dashed lines are supported.
Edge Line Width
Edge Source and Target Arrow Shape: The following options are available:
Edge Source and Target Arrow Color
Edge Source and Target Arrow Opacity
Edge Label: the text label for each edge.
Edge Label Color
Edge Label Opacity
Edge Font: edge label font and size.
Table 20.
Global Visual Properties
Background Color
Selected Node Color
Selected Edge Color
For each visual attribute, you can specify a default value or define a dynamic visual mapping. Cytoscape currently supports three different types of visual mappers:
Passthrough Mapper
- The values of network attributes are passed directly through to visual attributes. A passthrough mapper is only used to specify node/edge labels. For example, a passthrough mapper can label all nodes with their common gene names.
Discrete Mapper
- Discrete network attributes are mapped to discrete visual attributes. For example, a discrete mapper can map all protein-protein interactions to the color blue.
Continuous Mapper
- Continuous graph attributes are mapped to visual attributes. Depending on the visual attribute, there are three kinds of continuous mappers:
Continuous-to-Continuous Mapper: for example, you can map a continuous numerical value to a node size.
Color Gradient Mapper: This is a special case of continuous-to-continuous mapping. Continuous numerical values are mapped to a color gradient.
Continuous-to-Discrete Mapper: for example, all values below 0 are mapped to square nodes, and all values above 0 are mapped to circular nodes.
- However, note that there is no way to smoothly morph between circular nodes and square nodes.
The table below shows visual mapper support for each visual property.
Legend
Table 21.
Symbol
Description
-
Mapping is not supported for the specified visual property.
X
Mapping is fully supported for the specified visual property.
o
Mapping is partially supported for the specified visual property. Support for “continuous to continuous” mapping is not supported.
Node Visual Mappings
Table 22.
Node Visual Properties
Passthrough Mapper
Discrete Mapper
Continuous Mapper
Color
Node Color
-
X
X
Node Opacity
-
X
X
Node Border Color
-
X
X
Node Border Opacity
-
X
X
Node Label Color
-
X
X
Node Label Opacity
-
X
X
Numeric
Node Size
-
X
X
Node Font Size
-
X
X
Node Line Width
-
X
X
Other
Node Border Type
-
X
o
Node Shape
-
X
o
Node Label
X
X
o
Node Tooltip
X
X
o
Node Font Family
-
X
o
Edge Visual Mappings
Table 23.
Edge Properties
Passthrough Mapper
Discrete Mapper
Continuous Mapper
Color
Edge Color
-
X
X
Edge Opacity
-
X
X
Edge Target Arrow Color
-
X
X
Edge Source Arrow Color
-
X
X
Edge Target Arrow Opacity
-
X
X
Edge Source Arrow Opacity
-
X
X
Edge Label Color
-
X
X
Edge Label Opacity
-
X
X
Numeric
Edge Line Width
-
X
X
Edge Font Size
-
X
X
Other
Edge Line Type
-
X
o
Edge Source Arrow Shape
-
X
o
Edge Target Arrow Shape
-
X
o
Edge Label
X
X
o
Edge Tooltip
X
X
o
Edge Font Family
-
X
o
The following tutorial demonstrates new features in Cytoscape 2.5. The new VizMapper user interface has some utilities to help users editing discrete mappings. The goal of this section is learning how to set and adjust values for discrete mappings automatically.
Load a sample network: From the main menu, select File → Import → Network, and select sampleData/galFiltered.sif.
Apply layout to the network: From the main menu, select Layout → Cytoscape Layouts → Degree Sorted Circle Layout. This layout algorithm sort nodes in a circle by degree of the nodes. Degrees will be stored as node attribute names Degree after you applied this algorithm.
Click the VizMap
button on the tool bar.
Click Defaults panel on the VizMapper main panel. Default Apearence Editor pops up (see below.)
Edit the following visual properties and press Apply. Since you changed opacity of the node, you can see the nodes bihind the front node (see below.)
- Node Oppacity - 100
- Edge Color - White
- Background Color - Black
Cretate a Discrete Node Color Mapping. Select Degree as controlling attribute.
Select Node Color, then right click to show popup menu. Select Generate discrete values → Rainbow 1. It generates different colors for different attribute values as shown below.
Cretate a Discrete Node Size Mapping. Select Degree as controlling attribute.
Select Node Size and right click to show popup menu. Select Generate Discrete Values → Series (Numbers Only). Type 30 for the first value and click OK. Enter 15 for increment.
- Apply Force-Directed layout. Final view of the window looks like the following.
Table 24.
Editor Type
Supported Data Type
Visual Attributes
Color Gradient Editor
Color
node/edge/border/label colors
Continuous-Continuous Editor
Numbers
size/width/opacity
Continuous-Discrete Editor
All others
font/shape/text
Cytoscape includes a Quick Find feature, which enables you to quickly find nodes and edges.
Using Quick Find is very simple. Here is how it works:
The Advanced panel can be opened by clicking on the plus (+) sign.
There are three rows in the advanced panel:
3. Negation checkbox. If this checkbox is checked, the result of the filter will be negated.
In the option menu pulldown, there are menu items “Create new topology filter”, “Create new NodeInteraction filter” and “Create new EdgeInteraction filter”.
Topology Filter
Topology filter will select nodes based on the properties of its near-by nodes (neighbors). To create a topology filter, choose the menu item “Create new topology filter” from the option menu. See below,
Interaction filter
Interaction filters are used to select nodes/edges based on the properties of their neighboring edges/nodes. See below for a node interaction filter.
Basic filters allow the selection of multiple nodes or edges according to singe attribute data:
Compound filters allow selection based on the application of pre-existing filters:
Example filters are shipped with the plugin to get started.
If the first filter is selected, then the window looks as shown:
There are three panels in the Filters window:
- The right-hand panel: An existing or newly created filter can be edited in this area. Each filter type has its own user interface for editing.
- The lower left panel: All available filters are shown in this list. Initially, this list will contain only sample filters, but as you create more, they will be added here.
- The upper left panel: Pressing the Create new filter button adds a filter to the “Available Filters” box, and the Remove selected filter button deletes the currently selected filter.
You can abort the drawing of the edge by clicking on an empty spot on the palette.
For plugin developer, to enable automatic download of your plugin to Cytoscape users, your plugin should be in compliance with Cytoscape 2.5, and the plugin jar/zip files should be uploaded to the Cytoscape plugin web site at http://cytoscape.org/plugins25/index.php.
Note: If you do not have Internet access enabled, you will not see the list of available plugins or be able to automatically update existing ones; however, you will still be able to view and delete previously installed plugins.
If an installation error appears, automatic installation of the plugin may not be supported. To manually install the plugin, go to the Cytoscape plugins page (http://cytoscape.org/plugins25/index.php), scroll down to find the plugin, click on the appropriate link to download the file, and then save it in the Cytoscape/plugins
folder on your hard drive.
- Cytoscape will require a restart in order to load the manually installed plugin.
- If the plugin does not appear in the Currently Installed folder of the Plugin Manager, then Cytoscape was unable to load the plugin. Your command line will display the error message generated.
CytoPanels are floatable/dockable panels designed to cut down on the number of pop-up windows within Cytoscape and to create a more unified user experience. These panels used to be called CytoPanel 1, 2, and 3. From 2.5, they are named based on their functions. The following screenshot shows the file yeastHighQuality.sif
and GO annotations loaded into Cytoscape, performed Force-Directed layout, enable Align and Distribute tools, and then run MCODE plugin for the data sets. In Control Panel (at the left-hand side of the screen), the Network Manager, Network Overview, VizMapper, Filters, and Cytoscape Editor have been loaded. On the bottom of the panel, there is another CytoPanel called Tool Panel. In the Data Panel, the Attribute Browser has been loaded. In addition, Result of the analysis by MCODE plugin is shown in Result Panel (at the right-hand side).
The user can then choose to resize, hide or float CytoPanels. For example, in the screenshot below, the user has chosen to float all panels and toolbar:
Table 26.
Large Network with Low LOD
Large Network with High LOD
With low LOD values, all nodes are displayed as square and anti-alias is turned off. With high LOD values, anti-alias is turned on and nodes are displayed as actual shape user specified in the Visual Style.
GO 8150 biological_process
GO 7582 physiological processes
GO 8152 metabolism
GO 44238 primary metabolism
GO 19538 protein metabolism
GO 6412 protein biosynthesis
Graphical View of GO Term 6412: protein biosynthesis
Sample OBO File - gene_ontology.obo: http://www.geneontology.org/ontology/gene_ontology_edit.obo
format-version: 1.2
date: 27:11:2006 17:12
saved-by: midori
auto-generated-by: OBO-Edit 1.002
subsetdef: goslim_generic "Generic GO slim"
subsetdef: goslim_goa "GOA and proteome slim"
subsetdef: goslim_plant "Plant GO slim"
subsetdef: goslim_yeast "Yeast GO slim"
subsetdef: gosubset_prok "Prokaryotic GO subset"
default-namespace: gene_ontology
remark: cvs version: $Revision: 5.49 $
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome." [GOC:ai]
is_a: GO:0007005 ! mitochondrion organization and biogenesis
Table 28.
Ontology Name
Description
Gene Ontology Full
This data source contains a full-size GO DAG, which contains all GO terms. This OBO file is written in version 1.2 format.
Generic GO slim
A subset of general GO Terms, including higer-level terms only.
Yeast GO slim
A subset of GO Terms for annotating Yeast data sets maintained by SGD.
Molecule role (INOH Protein name/family name ontology)
A structured controlled vocabulary of concrete and abstract (generic) protein names. This ontology is a INOH pathway annotation ontology, one of a set of ontologies intended to be used in pathway data annotation to ease data integration. This ontology is used to annotate protein names, protein family names, and generic/concrete protein names in the INOH pathway data. INOH is part of the BioPAX working group.
Event (INOH pathway ontology)
A structured controlled vocabulary of pathway-centric biological processes. This ontology is a INOH pathway annotation ontology, one of a set of ontologies intended to be used in pathway data annotation to ease data integration. This ontology is used to annotate biological processes, pathways, and sub-pathways in the INOH pathway data. INOH is part of the BioPAX working group.
Protein-protein interaction
A structured controlled vocabulary for the annotation of experiments concerned with protein-protein interactions.
Pathway Ontology
The Pathway Ontology is a controlled vocabulary for pathways that provides standard terms for the annotation of gene products.
PATO
PATO is an ontology of phenotypic qualities, intended for use in a number of applications, primarily phenotype annotation. For more information, please visit the PATO wiki (http://www.bioontology.org/wiki/index.php/PATO:Main_Page).
Mouse pathology
The Mouse Pathology Ontology (MPATH) is an ontology for mutant mouse pathology. This is Version 1.
Human disease
This ontology is a comprehensive hierarchical controlled vocabulary for human disease representation. For more information, please visit the Disease Ontology website (http://diseaseontology.sourceforge.net/).
Although Cytoscape can import all kinds of ontologies in OBO format, annotation files are associated with specific ontologies. Therefore, you need to provide the correct ontology-specific annotation file to annotate nodes/edges/networks in Cytoscape. For example, while you can annotate human network data using the GO Full ontology with human Gene Association files, you cannot use a combination of the human Disease Ontology file and human Gene Association files, because the Gene Association file is only compatible with GO.
- Note 1: Some ontologies have a lot of terms. For example, the full Gene Ontology contains more than 20,000 terms. If you need to save memory, you can remove this ontology DAG from Network Panel (right-click on the ontology name at the left-hand side of the screen and select Destroy Network).
- Note 2: All ontology DAGs will be saved in the session file. To minimize the session file size, you can delete the Ontology DAG before saving session.
Sample Gene Association File (gene_association.sgd - annotation file for yeast):
SGD S000003916 AAD10 GO:0006081 SGD_REF:S000042151|PMID:10572264 ISS P aryl-alcohol dehydrogenase (putative) YJR155W gene taxon:4932 20020902 SGD
SGD S000005275 AAD14 GO:0008372 SGD_REF:S000069584 ND C aryl-alcohol dehydrogenase (putative) YNL331C gene taxon:4932 20010119 SGD
From Cytoscape 2.6.0, you can import various kinds of ID sets from BioMart (http://www.biomart.org/index.html). BioMart web service client is available as a set of plugins. You can install BioMartClient and BioMartUserInterface plugins from Plugin Manager window.
Select: File → Import → Import attributes from Biomart...
Select a data source. For ID mapping, select one of the Ensemble Genes data set. You need to choose correct species for your network.
Select Attribute. If you want to import new ID sets matching current node IDs, select ID.
Select Data Type. This should be the type of ID set selected in Attribute list. For example, if you select ID for Attribute and your network uses Entrez Gene ID for its node ID, you need to select EntrezGene ID(s) for Data Type.
Select new ID sets from the list. Because BioMart server does not accept query to import lots of annotations at once, you can select only 3-5 attributes for each import.
Press Import.
Download name mapping files. Mapping files are available at: http://chianti.ucsd.edu/kono/genenamemapping.html. In this tutorial, we are going to use dictionary_no_prefix.zip, which is a file set without prefixes for each gene names. Unzip the archive.
Load sample network file. Open network import dialog from File-->Import-->Network (multiple file types)... Then click URL radio button and import Human Protein-Protein: Rual et al. (Subnetwork for tutorial).
Open attribute table import dialog from File-->Import-->Attribute from Table.
Select human.dic_cyto.txt as the input file.
Check "Show Text File Import Options and click Transfer first line as attribute names checkbox.
<listitem>Uncheck "Show Text File Import Options
and check Mapping Options. </listitem>Select EntrezGene as Primary Key.
Right-click on EntrezGene column name and set the type to String.
Do the same for HGNC.
Right-click on Other Aliases and select List as the data type.
Check Other Aliases as Alias (under "Alias?" checkboxes).
- Now the Table Import dialog looks like the following screenshot:
Press Import. The network has new names in the text file as attributes.
At this point, nodes have multiple names including HGNC, UniProt, and EntrezGene ID. You can import other attribute files using these keys. These imported names (IDs) are useful when you import GO Annotation.
Ontology DAGs have some attributes associated with the terms. All attributes associated with ontology terms will have the prefix ontology. They have at least one attribute: ontology.name
. For more detailed information about attributes for ontology DAGs, please read the official OBO specification document.
- Note: Cytoscape supports both OBO formats: version 1.0 and 1.2.
Suppose you have a small network:
node_1 pp node_2
node_3 pp node_1
node_2 pp node_3
node_1 OA_0000232
node_2 OA_0000441
node_3 OA_0000702
Some ontologies will be used to annotate edges or networks. For example, the Protein-protein interaction ontology is a controlled set of terms for annotating interactions between proteins, so ontology terms should be mapped onto edges (see example below).
node_1 (pp) node_2 MI:0445
node_3 (pp) node_1 MI:0046
node_2 (pp) node_3 MI:0346
The basic operation of the Ontology and Annotation Import function is the same as that of the Attribute Table Import. The main difference is that you need to specify an additional key for mapping:
By selecting a column from the "Key Column in Annotation File" dropdown list, you can specify the key for mapping between ontology terms and the annotation file.
- Note: When you load Gene Association files, Cytoscape uses a special loader program designed only for Gene Association files. Because of this program, all attributes will be named automatically. Also, ontology IDs will be converted into term names and NCBI taxonomy ID will be converted into actual species name. However, for custom annotation files, those processes will not be applied. All ontology terms will be mapped as term IDs.
For example, the following entry:
nodelinkouturl.Model Organism DB.SGD (yeast)=http://db.yeastgenome.org/cgi-bin/locus.pl?locus=%ID%
places the SGD link under the yeast submenu. This link will appear in Cytoscape as:
In a similar fashion one can added new submenus.
http://db.yeastgenome.org/cgi-bin/locus.pl?locus\=YIM1
cytoscape.sh -P new_linout.props
cytoscape.sh -P nodelinkouturl.yeast.SGD=http://db.yeastgenome.org/cgi-bin/locus.pl?locus\=%ID%
Any links defined on the command line will supersede the default links.
From Cytoscape 2.6.0, you can use LinkOut from Attribute Browser. Basic functionality is the same, and the only difference is the parameter passed to the LinkOut is value in the selected cell.
The Colt Distribution: Open Source Libraries for High Performance Scientific and Technical Computing in Java. Information is available at: http://hoschek.home.cern.ch/hoschek/colt/.
Graph INterface librarY a.k.a. GINY. Information is available at: http://csbi.sourceforge.net/.
JDOM. Information is available at: http://www.jdom.org.
JUnit. Information is available at: http://junit.org.
JGoodies Looks. Information is available at: http://www.jgoodies.com/freeware/looks/index.html.
Piccolo. Information is available at: http://www.cs.umd.edu/hcil/jazz/.
Type-Specific Collections Library, from Sosnoski Software Solutions, Inc. Information is available at: http://www.sosnoski.com/opensrc/tclib/.
Xerces Java XML parser. Information is available at: http://xml.apache.org/xerces-j/.
CLI command line parser. Information is available at: http://jakarta.apache.org/commons/cli/.
FreeHEP library. Information is available at: http://java.freehep.org.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/).
This product includes software developed by the JDOM Project (http://www.jdom.org/).
One-step installation of the Cytoscape software is accomplished using the InstallAnywhere product from ZeroG Software, Inc. (http://zerog.com)
The flat file formats are explained below:
By example (the Gene Ontology - GO):
(curator=GO) (type=all)
0003673 = Gene_Ontology
0003674 = molecular_function [partof: 0003673 ]
0008435 = anticoagulant [isa: 0003674 ]
0016172 = antifreeze [isa: 0003674 ]
0016173 = ice nucleation inhibitor [isa: 0016172 ]
0016209 = antioxidant [isa: 0003674 ]
0045174 = glutathione dehydrogenase (ascorbate) [isa: 0009491 0015038 0016209 0016672 ]
0004362 = glutathione reductase (NADPH) [isa: 0015038 0015933 0016209 0016654 ]
0017019 = myosin phosphatase catalyst [partof: 0017018 ]
...
A second example (KEGG pathway ontology):
(curator=KEGG) (type=Metabolic Pathways)
90001 = Metabolism
80001 = Carbohydrate Metabolism [isa: 90001 ]
80003 = Lipid Metabolism [isa: 90001 ]
80002 = Energy Metabolism [isa: 90001 ]
80004 = Nucleotide Metabolism [isa: 90001 ]
80005 = Amino Acid Metabolism [isa: 90001 ]
80006 = Metabolism of Other Amino Acids [isa: 90001 ]
80007 = Metabolism of Complex Carbohydrates [isa: 90001 ]
...
By example (from the GO biological process annotation file):
(species=Saccharomyces cerevisiae) (type=Biological Process) (curator=GO)
YMR056C = 0006854
YBR085W = 0006854
YJR155W = 0006081
...
(species=Mycobacterium tuberculosis) (type=Metabolic Pathways) (curator=KEGG)
RV0761C = 10
RV0761C = 71
RV0761C = 120
RV0761C = 350
RV0761C = 561
RV1862 = 10
...
Go to the GO XML FTP (ftp://ftp.geneontology.org/pub/go/xml/) page. Download the latest go-YYYYMM-termdb.xml.gz
file.
GO maintains a list of association files for many organisms; these files associate genes with GO terms. The next step is to get the file for the organism(s) you are interested in, and parse it into the form Cytoscape needs. A list of files may be seen at http://www.geneontology.org/GO.current.annotations.shtml. The rightmost column contains links to tab-delimited files of gene associations, by species. Choose the species you are interested in, and click 'Download'.
Let's use "GO Annotations @ EBI: Human" as an example. After you have downloaded and saved the file, look at the first few lines:
SPTR O00115 DRN2_HUMAN GO:0003677 PUBMED:9714827 TAS F Deoxyribonuclease II precursor IPI00010348 protein taxon:9606 SPTR
SPTR O00115 DRN2_HUMAN GO:0004519 GOA:spkw IEA F Deoxyribonuclease II precursor IPI00010348 protein taxon:9606 20020425 SPTR
SPTR O00115 DRN2_HUMAN GO:0004531 PUBMED:9714827 TAS F Deoxyribonuclease II precursor IPI00010348 protein taxon:9606 SPTR
...
Note that line wrapping has occurred here, so each line of the actual file is wrapped to two lines. The goal is to create from these lines the following lines:
(species=Homo sapiens) (type=Molecular Function) (curator=GO)
IPI00010348 = 0003677
IPI00010348 = 0004519
IPI00010348 = 0004531
...
or
(species=Homo sapiens) (type=Biological Process) (curator=GO)
NP_001366 = 0006259
NP_001366 = 0006915
NP_005289 = 0007186
NP_647593 = 0006899
...
The first sample contains molecular function annotations for proteins, and each protein is identified by its IPI number. IPI is the International Protein Index, which maintains cross references to the main databases for human, mouse and rat proteomes. The second sample contains biological process annotation, and each protein is identified by its NP (RefSeq) number. These two naming systems, IPI and RefSeq, are two of many that you can use to define canonical names when you run Cytoscape. For budding yeast, it is much easier: the yeast community always uses standard ORF names, and so Cytoscape uses these as canonical names. For human proteins and genes, there is no single standard.
The solution (for those working with human genes or proteins) is, once you have downloaded the annotations file, to:
- Decide which naming system you want to use.
Download ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/xrefs.goa. This cross-reference file, when used strategically, allows you to create Cytoscape-compatible annotation files in which the canonical name is the one most meaningful to you.
Examine xrefs.goa
to figure out which column contains the names you wish to use.
- Make a very slight modification to the python script described below, and then
Run that script, supplying both xrefs.goa
and that annotation file as arguments.
Here are a few sample lines from xrefs.goa
:
SP O00115 IPI00010348 ENSP00000222219; NP_001366; BAA28623;AAC77366;AAC35751;AAC39852;BAB55598;AAB51172;AAH10419; 2960,DNASE2 1777,DNASE2
SP O00116 IPI00010349 ENSP00000324567;ENSP00000264167; NP_003650; CAA70591; 327,AGPS 8540,AGPS
SP O00124 IPI00010353 ENSP00000265616;ENSP00000322580; NP_005662; BAA18958;BAA18959;AAH20694; 7993,D8S2298E
...
Note that line wrapping has occurred here – each line in this example starts with the letters SP. See the README file for more information (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/README).
Finally, run the script to create your three annotation files for human proteins:
bioproc.anno
(GO biological process annotation)
molfunc.anno
(GO molecular function annotation)
cellcomp.anno
(GO cellular component annotation)
using the supplied python script. It may be necessary to modify this script slightly if RefSeq identifiers are not used as canonical names or if you are using a more recent version of Python.
python parseAssignmentsToFlatFileFromGoaProject.py gene_association.goa_human xrefs.goa
(See below for Python script listing)
These scripts, as described above, require Python version 2.2 or later.
Script 1 - parseGoTermsToFlatFile.py
# parseGoTermToFlatFile.py: translate a GO XML ontology file into a simpler
# Cytoscape flat file
#-----------------------------------------------------------------------------------
# RCS: $Revision: 1.3 $ $Date: 2003/05/18 00:38:43 $
#-----------------------------------------------------------------------------------
import re, pre, sys
#-----------------------------------------------------------------------------------
def flatFilePrint (id, name, isaIDs, partofIDs):
isa = ''
if (len (isaIDs) > 0):
isa = '[isa: '
for isaID in isaIDs:
isa += isaID
isa += ' '
isa += ']'
partof = ''
if (len (partofIDs) > 0):
partof = '[partof: '
for partofID in partofIDs:
partof += partofID
partof += ' '
partof += ']'
result = '~np~%~/np~s = ~np~%~/np~s ~np~%~/np~s ~np~%~/np~s' ~np~%~/np~ (id, name, isa, partof)
result = result.strip ()
if (result == 'isa = isa' or result == 'partof = partof'):
print >> sys.stderr, 'meaningless term: ~np~%~/np~s' ~np~%~/np~ result
else:
print result
#-----------------------------------------------------------------------------------
if (len (sys.argv) != 2):
print 'usage: ~np~%~/np~s <someFile.xml>' ~np~%~/np~ sys.argv [0]
sys.exit ();
inputFilename = sys.argv [1];
print >> sys.stderr, 'reading ~np~%~/np~s...' ~np~%~/np~ inputFilename
text = open (inputFilename).read ()
print >> sys.stderr, 'read ~np~%~/np~d characters' ~np~%~/np~ len (text)
regex = '<go:term .*?>(.*?)</go:term>';
cregex = pre.compile (regex, re.DOTALL) # . matches newlines
m = pre.findall (cregex, text)
print >> sys.stderr, 'number of go terms: ~np~%~/np~d' ~np~%~/np~ len (m)
regex2 = '<go:accession>GO:(.*?)</go:accession>.*?<go:name>(.*?)</go:name>'
cregex2 = re.compile (regex2, re.DOTALL)
regex3 = '<go:isa\s*rdf:resource="http://www.geneontology.org/go#GO:(.*?)"\s*/>'
cregex3 = re.compile (regex3, re.DOTALL)
regex4 = '<go:part-of\s*rdf:resource="http://www.geneontology.org/go#GO:(.*?)"\s*/>'
cregex4 = re.compile (regex4, re.DOTALL)
goodElements = 0
badElements = 0
print '(curator=GO) (type=all)'
for term in m:
m2 = re.search (cregex2, term)
if (m2):
goodElements += 1;
id = m2.group (1)
name = m2.group (2)
isaIDs = []
m3 = re.findall (cregex3, term);
for ref in m3:
isaIDs.append (ref)
m4 = re.findall (cregex4, term);
partofIDs = []
for ref in m4:
partofIDs.append (ref)
flatFilePrint (id, name, isaIDs, partofIDs)
else:
badElements += 1;
print >> sys.stderr, 'no match to m2...'
print >> sys.stderr, "---------------\n~np~%~/np~s\n------------------" ~np~%~/np~ term
print >> sys.stderr, 'goodElements ~np~%~/np~d' ~np~%~/np~ goodElements
print >> sys.stderr, 'badElements ~np~%~/np~d' ~np~%~/np~ badElements
#--------------------------------------
Script 2 - parseAssignmentsToFlatFileFromGoaProject.py
import sys
#-----------------------------------------------------------------------------------
def fixCanonicalName (rawName):
# for instance, trim 'YBR085W|ANC3' to 'YBR085W'
bar = rawName.find ('|')
if (bar < 0):
return rawName
return rawName [:bar]
#-----------------------------------------------------------------------------------
def fixGoID (rawID):
bar = rawID.find (':') + 1
return rawID [bar:]
#-----------------------------------------------------------------------------------
def readGoaXrefFile (filename):
lines = open (filename).read().split ('\n')
result = {}
for line in lines:
if (len (line) < 10):
continue
tokens = line.split ('\t')
ipi = tokens [2]
np = tokens [5]
semicolon = np.find (';')
if (semicolon >= 0):
np = np [:semicolon]
if (len (ipi) > 0 and len (np) > 0):
result [ipi] = np
return result
#-----------------------------------------------------------------------------------
if (len (sys.argv) != 3):
print 'error! parse <gene_associations file from GO> <goa xrefs file> '
sys.exit ()
associationFilename = sys.argv [1];
xrefsFilename = sys.argv [2]
species = 'Homo sapiens'
ipiToNPHash = readGoaXrefFile (xrefsFilename)
tester = 'IPI00099416'
print 'hash size: ~np~%~/np~d' ~np~%~/np~ len (ipiToNPHash)
print 'test map: ~np~%~/np~s -> NP_054861: ~np~%~/np~s ' ~np~%~/np~ (tester, ipiToNPHash [tester])
bioproc = open ('bioproc.txt', 'w')
molfunc = open ('molfunc.txt', 'w')
cellcomp = open ('cellcomp.txt', 'w')
bioproc.write ('(species=~np~%~/np~s) (type=Biological Process) (curator=GO)\n' ~np~%~/np~ species)
molfunc.write ('(species=~np~%~/np~s) (type=Molecular Function) (curator=GO)\n' ~np~%~/np~ species);
cellcomp.write ('(species=~np~%~/np~s) (type=Cellular Component) (curator=GO)\n' ~np~%~/np~ species);
lines=open(associationFilename).read().split('\n')
sys.stderr.write ('found ~np~%~/np~d lines\n' ~np~%~/np~ len (lines))
for line in lines:
if (line.find ('!') == 0 or len (line) < 2):
continue
tokens = line.split ('\t')
goOntology = tokens [8]
goIDraw = tokens [4]
goID = goIDraw.split (':')[1]
ipiName = fixCanonicalName (tokens [10])
if (len (ipiName) < 1):
continue
if (not ipiToNPHash.has_key (ipiName)):
continue
refseqName = ipiToNPHash [ipiName]
printName = refseqName
#printName = ipiName
if (ipiName == tester):
print '~np~%~/np~s (~np~%~/np~s) has go term ~np~%~/np~s' ~np~%~/np~ (tester, printName, goID)
if (goOntology == 'C'):
cellcomp.write ('~np~%~/np~s = ~np~%~/np~s\n' ~np~%~/np~ (printName, goID))
elif (goOntology == 'P'):
bioproc.write ('~np~%~/np~s = ~np~%~/np~s\n' ~np~%~/np~ (printName, goID))
elif (goOntology == 'F'):
molfunc.write ('~np~%~/np~s = ~np~%~/np~s\n' ~np~%~/np~ (printName, goID))
#-----------------------------------------------------------------------------------
GNU LESSER GENERAL PUBLIC LICENSE
GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
from Notes on memory consumption, Cytoscape User Manual
Suggested Memory Size Without View
Table 29.
Number of Objects (nodes + edges)
Suggested Memory Size
0 - 70,000
512M (default)
70,000 - 150,000
800M
Suggested Memory Size With View
Table 30.
Number of Objects (nodes + edges)
Suggested Memory Size
0 - 20,000
512M (default)
20,000 - 70,000
800M
70,000 - 150,000
1G
If you are opening Cytoscape from the command line using the command
then you can increase the value of –Xmx to the desired amount of memory. For example:
Option B: Using cytoscape.bat (Windows systems)
Option C: Using cytoscape.sh (UNIX, Linux, and Mac OS X systems)
Open the file cytoscape.sh
in a text editor (eg. right-click and select Open With TextEdit).
- Increase the value of the –Xmx tag (found in the last line of the file), as per Option A. Do not modify other parts of the file.
- Save and close the file.
- Open Cytoscape by running cytoscape.sh from the command line.
Option D: Using the Cytoscape icon (Mac OS X systems)
- In the Finder, right-click on the Cytoscape icon and select Show Package Contents.
- In the Property List Editor, expand the Root directory, then Java, and modify the VMOptions value (originally set as -Xmx512M) as per Option A. Do not modify other parts of the file.
- Save and close the file.
- Open Cytoscape by double-clicking on the icon.
Option E: Using the Cytoscape icon (Windows systems)
- Increase the numerical value (bytes of memory) of the heap size and stack size lines, shown below at 800M:
lax.nl.java.option.java.heap.size.max=838860800
lax.nl.java.option.native.stack.size.max=838860800
Do not modify other parts of the file, and be careful not to add any trailing spaces to these lines.
- Save and close the file.
- Open Cytoscape by double-clicking on the icon.