求助一perl程序
每组序列想经过如下处理:
1.翻译成protein
2.align protein序列
3.删除gap
4.将protein还原为DNA
处理的目的:用PAML的codeml找正选择位点
发现 http://bioperl.org/HOWTOs/html/PAML.html 上部分模块可是实现这个功能,如下:
use Bio::Tools::Run::Alignment::Clustalw;
# for projecting alignments from protein to R/DNA space
use Bio::Align::Utilities qw(aa_to_dna_aln);
# for input of the sequence data
use Bio::SeqIO;
use Bio::AlignIO;
my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();
my $seqdata = 'cdna.fa';
my $seqio = new Bio::SeqIO(-file => $seqdata,
-format => 'fasta');
my %seqs;
my @prots;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
$seqs{$seq->display_id} = $seq;
# translate them into protein
my $protein = $seq->translate();
my $pseq = $protein->seq();
if( $pseq =~ /\*/ &&
$pseq !~ /\*$/ ) {
warn("provided a cDNA sequence with a stop codon, PAML will choke!");
exit(0);
}
# Tcoffee can't handle '*' even if it is trailing
$pseq =~ s/\*//g;
$protein->seq($pseq);
push @prots, $protein;
}
if( @prots < 2 ) {
warn("Need at least 2 cDNA sequences to proceed");
exit(0);
}
open(OUT, ">align_output.txt") ||
die("cannot open output $output for writing");
# Align the sequences with clustalw
my $aa_aln = $aln_factory->align(\@prots);
# project the protein alignment back to cDNA coordinates
my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);
my @each = $dna_aln->each_seq();
......
运行到my $dna_aln = &aa_to_dna_aln($aa_aln, \%seqs);这一行老是报错(见图),请问这个问题怎么解决,而且怎么将删除gap后的DNA序列读到输出中?
