dxy logo
首页丁香园病例库全部版块
搜索
登录

GSEA原理以及软件的运行以及常见的错误及解决办法

其他学科医学生 · 发布于 2018-07-31 · IP 北京北京
2.8 万 浏览
icon推荐
这个帖子发布于 7 年零 100 天前,其中的信息可能已发生改变或有所发展。

第一部分 GSEA原理


目标:预先定义的基因集S是否随机的分布在排序的基因list

1. 表达谱,样品分为两类,以1/2定义

GSEA considers experiments with genomewide expression profiles from samples belonging to two classes, labeled
1 or 2.

2. 基因按照表达与分类的相关性排序

Genes are ranked based on the correlation between their expression and the class distinction by using any suitable metric

3. 计算富集打分(ES)

Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout L or primarily found at the top or bottom. We expect that sets related to the phenotypic distinction will tend to show the latter distribution.

Step 1: Calculation of an Enrichment Score.

We calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes (top or bottom) of the entire ranked list L.

The score is calculated by walking down the list L, increasing a running-sum statistic when we encounter a gene in S and decreasing it when we encounter genes not in S.

The magnitude of the increment depends on the correlation of the gene with the phenotype. The enrichment score is the maximum deviation from zero encountered in the random walk; it corresponds to a weighted Kolmogorov–Smirnov-like statistic

a running-sum statistic,

img

4. 评估ES的显著性(p值)

采用permutation :可以选择1000次,500次等


5. 多重检验校正(FDR值)


ref:

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

http://www.pnas.org/content/102/43/15545

https://blog.csdn.net/qq_29300341/article/details/52956052


42 171 12

全部讨论(0)

默认最新
avatar
42
分享帖子
share-weibo分享到微博
share-weibo分享到微信
认证
返回顶部