PyNAST 1.1 released: better alignments and a 15x speed increase!
The PyNAST 1.1 release went live today with several notable improvements. The first thing users need to know is that PyNAST now requires PyCogent 1.4.1 (get it here) and uclust 1.1.579 (get it here). Also, BLAST is no longer required, but can optionally be used for the pairwise alignment step. Unless you have specific reasons for using it, for example if you expect that one or both of the ends of your reads may be problematic, I recommend sticking with the default pairwise aligner, which is now uclust. uclust used a speed-optimized Needleman-Wunsch alignment.
The list of features that were added differs somewhat from the slated features announced last week. Of the three major changes I mentioned, the only one that actually made it into this release was feature 1, the switch from BLAST for sequence searching and aligning, to uclust for searching and aligning. This gave some nice improvements to the alignment quality over using BLAST, so for the time-being I decided to hold off on features 2 and 3.
A major enhancement in PyNAST 1.1 is that an issue which lead to poor alignments, and thereby artificially long branches in phylogenetic trees, has been dealt with. It appears that differences in BLAST parameter settings were causing rare differences between NAST and PyNAST 1.0 alignments. Large gaps were being handled in different way between the two implementations: NAST would truncate the sequence to avoid a poor alignment, while PyNAST would try to align the region. By default, PyNAST 1.1 uses a global rather than local alignment, and as a consequence achieves better alignments in these cases. Interested users can check out this alignment — I recommend viewing this in a text editor (e.g., TextMate), turning off soft wrapping of lines, and jumping to around column 1637 to see where the difference in implementations begin to differ. As you can see, PyNAST 1.1 achieves the best alignment of the three with respect to the template sequence. This is also evident when comparing FastTree phylogenies built from lanemasked variants of the full alignments. The long branch in Figure 1a corresponds to the sequence presented in the alignment.
In addition to better alignments, a major improvement in PyNAST 1.1 is a decrease in runtime of approximately 15 times compared to PyNAST 1.0, when using the default parameters in both cases. I attribute this primarily to the switch from BLAST to uclust — thanks a lot to Robert Edgar, author of uclust, for all the help with making the switch to uclust. Our uclust wrappers now live in PyCogent (beginning with version 1.4.1), and are used extensively in PyNAST and QIIME. See Figure 2 for an illustration of the improvement in runtime when comparing PyNAST 1.1 to PyNAST 1.0 and the NAST command line application.
Finally, an important note for PyNAST’s microbial ecologist and QIIME users. While the resulting alignments — and therefore the derived phylogenetic trees — are improved in PyNAST 1.1, the difference has little or no affect on beta diversity conclusions. Figure3 illustrates this point by comparing PCoA plots which summarize unweighted Unifrac distances between samples in the QIIME tutorial data set, when alignments were constructed with PyNAST 1.0 (Figure 3a) and PyNAST 1.1 (Figure 3b). There is little difference between the two plots, and certainly no difference in the conclusions drawn from the figures.

Figure 3a: PCoA plots summarizing unweighted unifrac distances between samples in the QIIME tutorial data set. Alignments were generated with PyNAST 1.0.

Figure 3b: PCoA plots summarizing unweighted unifrac distances between samples in the QIIME tutorial data set. Alignments were generated with PyNAST 1.1.
Thanks to all users and collaborators for the helpful suggestions that went into PyNAST 1.1!
-Greg



