Home > Uncategorized > Changes to PyNAST in preparation for PyNAST 1.1

Changes to PyNAST in preparation for PyNAST 1.1

March 24, 2010

The PyNAST code is undergoing significant changes this week in preparation for PyNAST 1.1, which I’m hoping to release by 2 April 2010. This message is primarily important for users of the PyNAST svn code.

In this release we’re no longer aiming to match the results of the original NAST web server and command line application, but rather to focus on getting better alignments. Users who want to match those results will always be able to use PyNAST 1.0.

PyNAST’s command line interface will mostly stay the same in PyNAST 1.1. The major changes are improvements to the underlying PyNAST algorithm. While I’m making these changes over the next few days, the svn code is likely to be somewhat unstable. I recommend that svn users go with one of the following strategies:

  • stick with the svn version that you’re currently using until PyNAST 1.1 is released;
  • use PyNAST 1.0 until PyNAST 1.1 is released;
  • or, at least, be aware that PyNAST alignments may change from version to version over the next few days.

As always, the svn code is less stable than the release code. I do generally encourage users to work with the svn version if they can tolerate some instability. Your input helps the development team improve PyNAST by identifying issues prior to packing up releases.

Here’s a sneak peak of some of the new features:

Feature 1

Switch from default of searching and aligning with BLAST to searching and aligning with uclust. Because uclust uses k-mer searching and performs global alignments, this addresses findings from Schloss 2009 (briefly, that k-mer searching is better, and that global alignment works at least as well as local alignment for this application). Additionally, this will effectively remove the dependence on BLAST, which some users have trouble with, and should speed PyNAST up (possibly by orders of magnitude).

Feature 2
Support will be added for using a different pairwise alignment based on %id of the candidate and the best template match. By default, the uclust alignment will always be used, but this will leave the door open for (e.g.) aligning with structure-based aligners (like infernal) for more distant matches. Depending on how long other changes take, I may not add support for this to the command line interface in PyNAST 1.1, but I hope to at least add it to the API so it’s accessible via QIIME.

Feature 3
More informed choice of gaps positions to delete when decreasing the alignment length by incorporating conservation data. We’ve recently noticed some long-branch issues resulting from trees generated from PyNAST alignments. I think these occur when a bad choice is made for the gap to remove to reduce the length of the candidate sequence alignment.

My initial thought on how to achieve this is that we can allow users to (optionally) provide a lanemask file (like the greengenes lanemask). If provided, rather than simply removing the nearest gap, PyNAST can count the number of conserved positions (i.e., lanemask=1) that would be disrupted by different choices of gap positions to remove. It would then remove the gap that disturbs the fewest conserved positions (and default to the closest gap in the event of a tie). I plan to do some experimentation with this, and see if it would address some of the long branch results that people have shared with me recently.

Thanks!

Greg

Advertisement
Follow

Get every new post delivered to your Inbox.