找回密码
 注册
搜索
热搜: 超星 读书 找书
查看: 465|回复: 0

[【学科前沿】] DNA让我们重新反醒基因的含义

[复制链接]
发表于 2007-9-14 15:54:22 | 显示全部楼层 |阅读模式
DNA让我们重新反醒基因的含义

GENOMICS:
DNA Study Forces Rethink of What It Means to Be a Gene
Elizabeth Pennisi

Genes, move over. Ever since the early 1900s, biologists have thought about heredity primarily in terms of genes. Today, they often view genes as compact, information-laden gems hidden among billions of bases of junk DNA. But genes, it turns out, are neither compact nor uniquely important. According to a painstaking new analysis of 1% of the human genome, genes can be sprawling, with far-flung protein-coding and regulatory regions that overlap with other genes.

As part of the Encyclopedia of DNA Elements (ENCODE) project, 35 research teams have analyzed 44 regions of the human genome covering 30 million bases and figured out how each base contributes to overall genome function. The results, compiled in a paper in the 14 June issue of Nature and 28 papers in the June issue of Genome Research, provide a litany of new insights and drive home how complex our genetic code really is. For example, protein-coding DNA makes up barely 2% of the overall genome, yet 80% of the bases studied showed signs of being expressed, says Ewan Birney of the European Molecular Biology Laboratory's European Bioinformatics Institute in Hinxton, U.K., who led the ENCODE analysis.
Given the traditional gene-centric perspective, that finding \"is going to be very disturbing to some people,\" says John Greally, a molecular biologist at Albert Einstein College of Medicine in New York City. On the other hand, says Francis Collins, director of the National Human Genome Research Institute (NHGRI) in Bethesda, Maryland, \"we're beginning to understand the ground rules by which the genome functions.\"

Once the human genome sequence was in hand by 2003, NHGRI set up ENCODE to learn what those 3 billion or so bases were all about. The initial 4-year, $42 million effort, which tackled 1% of the human genome, brought new and existing experimental and computational approaches to bear, mapping not just genes but also regulatory DNA and other important features such as gene start sites. NHGRI now plans to spend about $23 million annually over the next 4 years to perform a similar analysis of the whole genome, expecting that the lessons learned from the pilot ENCODE and new sequencing technologies will greatly reduce the costs of this extended project (Science, 25 May, p. 1120). \"The goal is to measure all the different kinds of features across the human genome and ask which features go together to understand the whole package,\" says George Weinstock, a geneticist at Baylor College of Medicine in Houston, Texas.

A key component of the pilot ENCODE is an analysis of the \"transcriptome,\" the repertoire of RNA molecules that cells create by transcribing DNA. For protein-coding genes, most of their RNA transcripts--the messenger RNA (mRNA)--get translated into chains of amino acids by ribosomes. For other types of \"genes,\" RNA is the end product: Ribosomal RNAs become the backbone of the ribosome, for example.

Researchers used to think very little RNA was produced beyond mRNA and a smattering of RNA end products. But about half the transcripts that molecular biologist Thomas Gingeras of Affymetrix Inc. in Santa Clara, California, discovered in his RNA survey 2 years ago didn't fit into these categories (Science, 20 May 2005, p. 1149), a finding ENCODE has now substantiated. The ENCODE researchers knew going in that the DNA they were studying produced about eight non-protein-coding RNAs, and they have now discovered thousands more. \"A lot more of the DNA [is] turning up in RNA than most people would have predicted,\" says Collins.

ENCODE has produced few clues as to what these RNAs do--leaving some to wonder whether experimental artifacts inflated the percentage of DNA transcribed. Greally is satisfied that ENCODE used enough different techniques to show that the RNA transcripts are real, but he's not sure they're biologically important. \"It's possible some of these transcripts are just the polymerase [enzyme] chugging along like an Energizer bunny\" and transcribing extra DNA, he suggests.

But in the 8 June issue of Science (p. 1484), Gingeras and his colleagues reported that many of the mysterious RNA transcripts found as part of ENCODE harbor short sequences, conserved across mice and humans, that are likely important in gene regulation. That these transcripts are \"so diverse and prevalent across the genome just opens up the complexity of this whole system,\" says Gingeras.

The mRNA produced from protein-coding genes also held surprises. When Alexandre Reymond, a medical geneticist at the University of Lausanne, Switzerland, and his colleagues took a close look at the 400 protein-coding genes contained in ENCODE's target DNA, they found additional exons--the regions that code for amino acids--for more than 80%. Many of these newfound exons were located thousands of bases away from the gene's previously known exons, sometimes hidden in another gene. Moreover, some mRNAs were derived from exons belonging to two genes, a finding, says Reymond, that \"underscores that we have still not truly answered the question, 'What is a gene?' \" In addition, further extending and blurring gene boundaries, ENCODE uncovered a slew of novel \"start sites\" for genes--the DNA sequences where transcription begins--many located hundreds of thousands of bases away from the known start sites.

Before ENCODE started, researchers knew of about 532 promoters, regulatory DNA that helps jump-start gene activity, in the human DNA chosen for analysis. Now they have 775 in hand, with more awaiting verification. Unexpectedly, about one-quarter of the promoters discovered were at the ends of the genes instead of at the beginning.

The distributions of exons, promoters, gene start sites, and other DNA features and the existence of widespread transcription suggest that a multidimensional network regulates gene expression. Gingeras contends that because of this complexity, researchers should look at RNA transcripts and not genes as the fundamental functional units of genomes. But Collins is more circumspect. The gene \"is a concept that's not going out of fashion,\" he predicts. \"It's just that we have to be more thoughtful about it.\"

DNA让我们重新反醒基因的含义

Elizabeth Pennisi

基因,不断进化。早在1900,生物学家一直在探索基因的原发遗传规律。今天,科学家们认为基因是排列紧密的物体,载满了遗传信息的基因藏在数以亿计的无任何功能的基因成分中。但是,现在的研究表明,基因排列并不紧密,也不是具有特别重要的信息。依照最新得来的艰辛的研究对1%的人类基因组的研究,基因的排列可以是杂乱无章的,有着宽大的蛋白质编码,调节区与其他基因重叠在一起。

作为百科全书的DNA元素(ENCODE)计划的一部分,35个研究小组分析了人类基因组的44个区域,包括了3亿的基因片断,计算出全部的基因片断包括多少个基因子.在六月十四号发行的Nature杂志上有一篇文章以及也是六月发行的基因研究杂志上有28篇文章提供了完整的研究结果,这是一种新的见解,告诉人们我们的遗传密码是多么的复杂。领导这个ENCODE计划的英国Hinxton的欧洲分子生物实验室的欧洲Bioinformatics学院的Ewan Birney教授说“例如,蛋白质、DNA构成了几乎2%的遗传密码,对80%的基因子研究表明他们曾经被表达。”

考虑到传统的基因中心的角度,这个发现“正让许多人困扰,”,John Greally,一位分子生物学家在纽约的Albert Einstein 医学院说。人类基因组计划的领导者Francis Collins在马里兰的Bethesda说,“另一方面,我们开始了解人类基因组功能的基本规则。”

人类基因组序列到2003年被获得,NHGRI就建立ENCODE,了解那30亿甚至更多的基因子是什么。在开始的四年,投入了4亿2千万美元,分析了1%的人类基因组,带来了新的、现有的实验方法、计算机分析技术来对基因和调节的DNA以及其他重要的部分,例如基因起始位点进行点位和绘图。现在NHGRI计划在下一个四年之后的每一年花费大约2亿3千万美元对整个基因组进行类似的分析。预期从ENCODE计划中得到经验以及新的序列测定方法可以大大的减少不断扩展的基因组计划的成本(Science, 五月二十五日, 1120页)。“我们的目标是测量人类基因组的所有的不同的特征,找出哪些通过哪些特征可以作为了解整个片断,” George Weinstock,一位位于得克萨斯州休斯顿的Baylor医学院的遗传学家说。

ENCODE计划的一个主要的成分是“转录子”的分析,RNA分子的所有组成成分,通过转录DNA产生的细胞。对于蛋白质,大多数的RNA转录物—信使RNA(mRNA)—通过核糖体翻译成氨基酸链。对于其他的“基因”,RNA是最终的产物:例如,核蛋白体RNAs变成核蛋白体的主链。

研究人员过去认为比较小的RNA是在mRNA和少数的RNA终产物之前产生的。但,加利福尼亚圣克拉拉Affymetrix研究院的分子生物学家Thomas Gingeras,二年前在他的RNA调查中发现大约一半的转录物不进入这些分类范畴((Science,五月20日 2005, 1149页),这个发现已被ENCODE证实。以前,ENCODE的研究人员知道那些他们正在研究的DNA产生大约八种非蛋白质RNAs,现在他们发现上千种了。Collins说“比起大多数人预期,有更多的DNA转变成RNA。”

ENCODE已经产生了极少量的关于这些RNAs是干什么的---给人们留下了悬念:是否存在实验的人为产物使得DNA转录的百分比膨胀。Greally对 ENCODE用充足的技术来确定RNA转录物是真的感到非常满意,但是他还不能肯定这些转录物是否具有生物上的重要性。“这些转录物中的一部分可能是多聚酶,像是兴奋剂在燃烧”转录成额外的DNA,他认为。

在六月八日出版的Science(1484页),Gingeras和他的同事报道,发现ENCODE包含的短的序列的一部分的非多的神秘的RNA转录物保守的跨越人和鼠,他们很可能在基因调节中发挥重要作用。用Gingeras的话来形容,这些转录物“是如此的多种多样以及普遍,遍及人类的基因组,对整个系统显示了复杂性“。

这些产自于蛋白质基因的mRNA也有着令人惊奇的地方。瑞士Lausanne大学的医学遗传学家Alexandre Reymond和他的同事对包含在ENCODE靶子DNA的400种蛋白质基因密切关注,他们发现外显子---编码氨基酸的区域---有80%。许多的这些新发现的外显子位于成千的基因子中,远离以前知道的外显子,有时躲在另外的基因后。此外,某些mRNA来自于属于两个基因的外显子,一个发现,Reymond说“划线以下的是我们仍然不能真实地回答的问题,‘基因是什么?’此外,进一步的扩展和模糊的基因界线,ENCODE没有揭示基因的新型的“起始位点”---DNA序列,是转录起始的地方---很多是点位于成百上千的基因子中,远离已知的启动位点。

在ENCODE还没开始之前,研究人员已知有大约532个启动子,调节DNA帮助起动基因活性,在所选作分析的人类DNA。现在他们手头已经有775个了,还有更多等着验证的。意外的是,有大约4分之1已发现的启动因子是在基因的末端,而不是在开始端。

外显子、启动因子、基因起始位点以及其他的DNA特征、广泛分布的转录因子的存在提示了一个多维的基因网络调节表达。Gingeras争论,因为这个复杂性,研究人员应该注视将RNA转录物而不是基因作为染色体组的基本功能单位。不过Collins还是比较慎重,基因“是一个不会从流行中消失的概念,”他预言,“我们要更多的思考它。”
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

Archiver|手机版|小黑屋|网上读书园地

GMT+8, 2024-10-4 17:28 , Processed in 0.093588 second(s), 5 queries , Redis On.

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表