This is still a work in progress. Please refer to the PDF document for other sections.
Returns a JSON of available data on WebGestalt. Data type could be idtype, geneset, referenceset, network, indicating supported ID mappings, gene set, reference set, and network (for NTA), respectively. Without the organism parameter, data of all supported organisms are returned. Supported organisms are athaliana, btaurus, celegans, cfamiliaris, drerio, sscrofa, dmelanogaster, ggallus, hsapiens, mmusculus, rnorvegicus, scerevisiae.
Name | Description | default |
---|---|---|
organism | Limit the data to one organism | - |
library(httr)
type <- "geneset"
organism <- "athaliana"
response <- GET(file.path("http://www.webgestalt.org/api/summary", type),
query=list(organism=organism))
if (response$status_code == 200) {
jsonData <- content(response)
print(jsonData)
}
{
"geneontology": [
{
"name": "Biological_Process",
"description": "The gene ontology biological process database was downloaded from http://www.geneontology.org/.",
"idtype": "entrezgene"
},
{
"name": "Biological_Process_noRedundant",
"description": "The gene ontology biological process database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500.",
"idtype": "entrezgene"
},
{
"name": "Cellular_Component",
"description": "The gene ontology cellular component database was downloaded from http://www.geneontology.org/.",
"idtype": "entrezgene"
},
{
"name": "Cellular_Component_noRedundant",
"description": "The gene ontology cellular component database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500.",
"idtype": "entrezgene"
},
{
"name": "Molecular_Function",
"description": "The gene ontology molecular function database was downloaded from http://www.geneontology.org/.",
"idtype": "entrezgene"
},
{
"name": "Molecular_Function_noRedundant",
"description": "The gene ontology molecular function database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500.",
"idtype": "entrezgene"
}
],
"pathway": [
{
"name": "KEGG",
"description": "The KEGG pathway database was downloaded from http://www.kegg.jp/.",
"idtype": "entrezgene"
},
{
"name": "Wikipathway",
"description": "The Wikipathway database was downloaded from http://www.wikipathway.org/.",
"idtype": "entrezgene"
}
],
"network": [
{
"name": "PPI_BIOGRID",
"description": "The protein-protein interaction (PPI) network was downloaded from BIOGRID (https://thebiogrid.org/). Then, we used the NetSAM R package (http://bioconductor.org/packages/release/bioc/html/NetSAM.html) to identify the hierarchical co-expression modules.",
"idtype": "entrezgene"
}
],
"disease": [],
"drug": [],
"phenotype": [],
"chromosomalLocation": [
{
"name": "CytogeneticBand",
"description": "",
"idtype": "entrezgene"
}
],
"community-contributed": [],
"others": []
}
Returns gene set data files, i.e. GMT, description file, DAG edge list, network.
Name | Description |
---|---|
organism | see above for supported organisms |
dbType | Database name e.g. ‘geneontology’, ‘pathway’. See summary results for supported types. |
dbName | Database name, e.g. ‘Biological_Process’, ‘KEGG’. See summary results. |
database | Shorthand for “dbType_dbName”. |
fileType | Could be one of “gmt”, “des”, “dag”; see below for details |
fileType: - gmt: the functional annotation in GMT format. - des: a two-column file of gene set ID in GMT and its description. - dag: for some databases like GO, a DAG file with columns of parent and child relationship.
Name | Description |
---|---|
ids | Just for datatype ‘des’, a subset of description is returned. |
library(httr)
organism <- "hsapiens"
database <- "pathway_KEGG"
fileType <- "gmt"
response <- GET("http://www.webgestalt.org/api/geneset",
query=list(organism=organism, database=database, fileType=fileType))
if (response$status_code == 200) {
fileContent <- content(response)
write(fileContent, "geneset.gmt")
geneSetData <- unlist(strsplit(fileContent, "\n", fixed=TRUE))
print(geneSetData[1:3])
}
First several lines of the GMT file.
hsa00010 http://www.kegg.jp/kegg-bin/show_pathway?hsa00010 10327 124 125 126 127 128 130 130589 131 160287 1737 1738 2023 2026 2027 217 218 219 220 2203 221 222 223 224 226 229 230 2538 2597 26330 2645 2821 3098 3099 3101 387712 3939 3945 3948 441531 501 5105 5106 5160 5161 5162 5211 5213 5214 5223 5224 5230 5232 5236 5313 5315 55276 55902 57818 669 7167 80201 83440 84532 8789 92483 92579 9562
hsa00020 http://www.kegg.jp/kegg-bin/show_pathway?hsa00020 1431 1737 1738 1743 2271 3417 3418 3419 3420 3421 4190 4191 47 48 4967 50 5091 5105 5106 5160 5161 5162 55753 6389 6390 6391 6392 8801 8802 8803
hsa00030 http://www.kegg.jp/kegg-bin/show_pathway?hsa00030 132158 2203 221823 226 229 22934 230 2539 25796 2821 414328 51071 5211 5213 5214 5226 5236 55276 5631 5634 6120 64080 6888 7086 729020 8277 84076 8789 9104 9563
...
Returns reference set text file used in ORA. The reference file has one NCBI Entrez gene ID or one phosphosite motif sequence per line.
Name | Description |
---|---|
organism | See above for supported organisms |
referenceSet | Name of the reference. See summary results. |
library(httr)
organism <- "hsapiens"
referenceSet <- "genome_protein-coding"
response <- GET("http://www.webgestalt.org/api/reference",
query=list(organism=organism, referenceSet=referenceSet))
if (response$status_code == 200) {
fileContent <- content(response)
write(fileContent, "reference.txt")
genes <- unlist(strsplit(fileContent, "\n", fixed=TRUE))
print(genes[1:3])
}
1
2
9
...