TCGA(癌症和肿瘤基因图谱)数据下载和处理(TCGA-Assembler)
TCGA 使命:提高人们对癌症发病分子基础的科学认识及提高我们诊断、治疗和预防癌症的能力
TCGA 目标:完成一套完整的与所有癌症基因组改变相关的“图谱”。

TCGA数据源大部分都是公开的,如何有效的进行收集和预处理是一个头疼的问题。今天我们讲解下怎么将TCGA的数据转化成癌症类型的二维数据矩阵(例如基因为rows,样本为columns)。得到这个矩阵之后,后面的事情就好办了,我们可以做差异表达,共表达网络,生存分析等。今天我们主要讲解如何下载TCGA的数据,大家对后续分析感兴趣的下,可以在加“生物信息培训+视频”裙,或者大家可以在掏宝搜索“生物信息视频”,跟我们联系。
我们开始吧,我们可以使用TCGA-Assembler这软件去下载TCGA的数据 http://www.compgenome.org/TCGA-Assembler/ 。TCGA-Assembler不但可以很方便的下载数据,还能对数据进行初始化处理,非常方便。下载完后,我们使用首先要安装一些依赖包。通过下面的命令:
install.packages(c("HGNChelper", "RCurl", "httr", "stringr", "digest", "bitops"), dependencies=T)
安装完了依赖包,我们进入刚才下载的TCGA-Assembler的目录,使用setwd(C:/Users/cloud/Desktop/TCGA-Assembler)设置TCGA-Assembler的目录为工作目录,接下来,我们就可以下载数据了。我们需要下载什么数据,就选择相应的脚本。具体脚本如下:
# Load module A functions.
source("Module_A.r");
# Download level-3 miRNA-seq data of six rectum adenocarcinoma (READ) samples
miRNASeqRawData = DownloadmiRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",
saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",
assayPlatform = "miRNASeq", inputPatientIDs = c("TCGA-EI-6884-01",
"TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01", "TCGA-AF-2689-11", "TCGA-AF-2691-11"));
# Download level-3 DNA copy number data of six READ samples
CNARawData = DownloadCNAData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",
saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",
assayPlatform = "genome_wide_snp_6", inputPatientIDs = c("TCGA-EI-6884-01",
"TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01", "TCGA-AF-2692-10", "TCGA-AG-4021-10"));
# Download level-3 RNASeqV2 gene expression and exon expression data of six READ samples
RNASeqRawData = DownloadRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
"./QuickStartGuide_Results/RawData/", cancerType = "READ", assayPlatform = "RNASeqV2",
dataType = c("rsem.genes.normalized_results", "exon_quantification"), inputPatientIDs =
c("TCGA-EI-6884-01", "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01", "TCGA-AG-3732-11",
"TCGA-AG-3742-11"));
# Download level-3 HumanMethylation27 data of six READ samples
Methylation27RawData = DownloadMethylationData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
"./QuickStartGuide_Results/RawData/", cancerType = "READ", assayPlatform = "humanmethylation27",
inputPatientIDs = c("TCGA-AG-3583-01", "TCGA-AG-A032-01", "TCGA-AF-2692-11", "TCGA-AG-4001-01",
"TCGA-AG-3608-01", "TCGA-AG-3574-01"));
# Download level-3 HumanMethylation450 data of six READ samples
Methylation450RawData = DownloadMethylationData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
"./QuickStartGuide_Results/RawData", cancerType = "READ", assayPlatform = "humanmethylation450",
inputPatientIDs = c("TCGA-EI-6884-01", "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01",
"TCGA-AG-A01W-11", "TCGA-AG-3731-11"));
# Download level-3 RPPA protein expression data of six READ samples
RPPARawData = DownloadRPPAData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
"./QuickStartGuide_Results/RawData", cancerType = "READ", assayPlatform = "mda_rppa_core",
inputPatientIDs = c("TCGA-EI-6884-01", "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01",
"TCGA-AG-3582-01", "TCGA-AG-4001-01"));
# Download de-identified clinical information of READ patients
DownloadClinicalData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
"./QuickStartGuide_Results/RawData", cancerType = "READ", clinicalDataType = c("patient", "drug", "follow_up"));
运行上面的脚本,我们就能得到我们想要的结果了,假如我们需要下载adenocarcinoma的miRNA数据,我们可以使用。下载完后,我们就得到了adenocarcinoma的矩阵了(基因为rows,样本为columns)。
setwd(C:/Users/cloud/Desktop/TCGA-Assembler)
source("Module_A.r");
miRNASeqRawData = DownloadmiRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",
saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",
assayPlatform = "miRNASeq");
最后编辑于 2022-10-09 · 浏览 8.1 万