统计学在芯片结果分析中的重要性
title :DNA microarrays: Vital statistics
全文链接:
http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n6949/full/424610a_fs.html&content_filetype=pdf
发表的一篇关于统计学在芯片结果分析中的重要性的文章。以下我节选出大家可能关心的问题的部分供大家讨论。
The problem is perverse: a typical microarray experiment provides both too much information, and too little. In most research projects, the idea is to study a small number of variables and repeat the measurements over and over again. Provided that you perform enough replicates, standard statistical tests can establish whether experimental results have real significance, or are more likely to be a consequence of random noise. Microarrays turn this approach on its head: there can be thousands of variables, corresponding to the number of individual genes being studied; but the high cost of the chips means that the number of repeated observations is usually very low.
大意:小样本的实验可以通过重复来避免随机误差,而对于一张芯片上成千上万的点来说,在资金有限的情况下,重复不是解决问题地有效途径。
Today, many manuscripts submitted to journals still focus their conclusions on genes that show a change in activity of, say, more than twofold. But how do you know whether or not an apparent twofold change in gene expression is biologically meaningful? That's a difficult question to answer, because of the many sources of noise that can cloud the results.
大意:对现在常被采用的2倍为判断差异表达的标准提出质疑。
But noise creeps into microarray experiments at every stage, from the preparation of tissue samples to the extraction of data. Using different dyes can influence the results recorded by the lasers that measure the fluorescent signals, as can the location of the spots on the chip, or any unevenness or dust on the glass slide. Even using samples from the same piece of tissue, it is possible to get different profiles of gene expression using different microarray technologies
大意:对在芯片实验过程中可能造成误差的原因进行阐述。
Few researchers are in a position to repeat their experiments using various microarray systems. But it should, in theory, be possible to perform two other types of replication. First, each sample can be subdivided and the experiment repeated on several chips to assess fluctuation from array to array. You can also perform measurements on several different samples within each experimental condition. This replication is particularly important, as it is the only way to address fluctuations in gene expression between biological samples that have nothing to do with the issue under investigation.
大意:作者提出三种不同的重复实验来验证芯片结果。(我想大部分实验室很难做到)
Even if you decide to perform proper, non-pooled replicates, it is difficult to know how many to do. One paper investigating the topic, published in 2000, recommended at least three replicates2. But among statisticians who have considered the issue, there is no clear consensus. "The number of replicates really depends on the kind of accuracy one wants to achieve," says Ernst Wit, a statistical geneticist at the University of Glasgow, UK. "It is impossible to come up with a single recommendation."
大意:对大家关心的重复次数问题进行说明,有的建议至少三次,有的认为重复次数的多少与你想获得怎样的结果密切相关。
Biologists are now trying to use replication and statistical analysis to separate meaningful changes in gene expression from background noise, often using software packages produced by academic researchers and available for free download, or marketed by chip manufacturers. But in many cases, say experts, the statistics aren't being used correctly. "The majority of microarray papers are analysed with substandard methods," claims David Allison, a biostatistician at the University of Alabama at Birmingham.
大意:指出大多数有关芯片的文章采用了不合适的统计学方法。
Some argue that the technique is best suited to determining relationships between a small number of variables, rather than deriving patterns involving thousands of genes across a huge data set. "Hierarchical trees are famously unreliable for good high-level clusters," explains Wit.
大意:指出Hierarchical cluster在芯片分析中存在的不足。
最后编辑于 2022-10-09 · 浏览 4175