Overview
Alignment view
In the alignment viewer, a user can browse an alignment of any set of uploaded nucleotide sequences (see example: alignment of four sequences). In the alignment, high-similarity regions detected by BLASTn or tBLASTx are color-coded based on the reported %-identity. In addition, dot plots summarizing these high-similarity regions are shown beside the alignment. Automatic position adjustment on the alignment (including adjustment of circular permutation) is implemented for enhancing visualization (e.g., for quick understanding of genomic colinearity). Manual position adjustment can also be applied. For details of genomic alignment view, see this section.
Gene prediction and similarity search
Gene prediction and protein similarity search against GenomeNet nr-aa, a non-redundant protein sequence database merging sequences of RefSeq, SwissProt, TrEMBL, and GenPept, can be performed in parallel with alignment computation. The result can be browsed through the web interface and downloaded as tables. In the alignment view, resulting gene positions and the best hit against nr-aa are used to indicate gene positions and labels.
Resource for computation
The DiGAlign server is a part of the GenomeNet service. The computational time is provided by Supercomputer System of the Institute for Chemical Research, Kyoto University.
Sequence upload
Prepare sequences
From the upload page, a user can upload nucleotide sequences and choose computational options (i.e, type of BLAST, and whether perform gene and function predictions). After validation of the uploaded sequences, a new session will be created and computation begins.
Current limitation for the number of sequences and ID constraints are listed below.
Gene information
There is options for gene finding: (1) use Prodigal to predict genes, (2) upload pre-defined BED-like formatted gene position table, and (3) without gene information.
If prodigal is used for gene prediction, coding table can be selected. An important note is that .....
When users would like to use their predefined gene information, a gene table can be uploaded. The table should be tab-separated, expanded BED-like format composed of columns as follows.
Be careful for the 2nd column. If a gene starts from the first nucleotide, the value should be 0 (not 1), while the 3rd column does not have to be changed: if a gene stops at the 90th nucleotide, the value should be 90.
Gene function prediction
DiGAlign can perform similarity search against GenomeNet nr-aa, a non-redundant protein sequence database merging sequences of RefSeq, SwissProt, TrEMBL, and GenPept, using GHOSTX (evalue cutoff: 0.1).
It will take a relatively long time to other computation performed by DiGAlign. Users can skip this process.
If the similarity search was performed, users can browse the resulting table (Example).
The top hits can be browsed, up to 100 hits, if exist. The GenomeNet nr-aa database is currently weekly updated by GenomeNet. The details of hit sequences can be browsed using a hyperlink.
Computation steps
Computation steps include BLAST execution, distance matrix generation, tree generation, gene finding, and gene similarity search. Clicking the submit button on the upload page makes redirection to a calculation progress reporting page like this. The page will be automatically reloaded to announce the progress under calculation. When all the calculation steps on the upload are finished, a "session" is activated. At that time, a notification email is sent to the uploaded address. The user is able to start browsing all the results from the session main page that is announced in the progress page and the email. An example of the session main page is shown (Example).
Computation time will depend on various factors. Generally, the sum of length of uploaded sequences is one factor. This is because the sequence length affects the computation time for the BLAST search. Note that gene similarity search by GHOSTX against GenomeNet nr-aa requires a relatively long computational time. We tested various input sequences with various options, and in most cases, calculations were finished from minutes to a couple of hours (except when the computater system is fully occupied).
Browsing session
The session main page is a portal for investigation of the results (Example). On the left of the screen (or on the top when the screen width is short), basic information of the session is listed. On the right of the screen (or on the top when the screen width is short), menu for all the viewer and file download are provided.
Contents of the basic information
Note about update recommendation
Contents of the menu
Genomic alignment
The page of the alignment view contains (1) panel for configuration/download and (2) the alignment view.
In the alignment view, a user can browse an alignment including genomes uploaded by the user. The alignment view visualizes homologous regions between genomes detected by BLASTn or tBLASTx (E-value < 1e-2). An example of the genomic alignment view is shown here.
Caution: Microsoft Internet Explorer may take a long time to visualize genomic alignments. Please use other browsers, such as Google Chrome (recommented), Edge, Firefox, Safari, etc.
This view also provides pairwise dot plots of sequences included in the alignment. A color bar on the upper left of the alignment represents %-identity shown in the alignment and dot plots. This bar can be enlarged/shrinked by scrolling mouse wheel on the bar.
Panel for configuration/download
A panel for configuration/download is shown above of the alignment image. The panel contains three navigation tabs: "Basic parameters", "Customize sequences", and "Download". These tabs provide versatile functions for publication-ready visualization of the alignment.
"Basic parameters" tab provides parameters related to genome positioning, gene labels, size adjustment, etc.
"Customize sequences" tab provides a table to add/alternate/delete sequences that are included in the alignment and reorder of sequences. Each genome in the alignment can be manually/automatically repositioned by circular permutation, reverse stranded and shift of start position.
"Download" tab provides a download link of a represented alignment image in the SVG format.
Guide tree
The page of the tree view contains (1) a guide tree and (2) a panel for configuration/download.
Guide tree
The guide tree is shown on the right of the screen (or on the bottom when the screen width is short). A user can select one from two types of tree view: "circular view" and "rectangular view". The circular view is designed for a comprehensive visualization of the tree, especially considering the case of hundreds of sequences. On the other hand, the rectangular (i.e., linear) view is suitable for browsing detailed information. When the number of sequences is small enough, the rectangular view could be comprehensive.
In the rectangular view, each of inner nodes represented by filled circles is linked to an alignment of genomes that are included in its subtree. Such inner node hyperlink can be shown in the circular view by a parameter "show link to alignment" inside the configuration panel. This panel provides various functions for configuration of a tree as well as image/data download. For details of the panel, see this section.
Note about the appearance of the tree
Panel for configuration / download
A panel for configuration / download is also shown on the left (or on the top when the screen width is short) of the guide tree. This panel provides a switch between the circular view and the rectangular view by using "Circular tree" tab and "Rectangular tree" tab on the top of the panel and by clicking the "redraw tree" button. "Download" tab provides download links for the visualization and tree files.
This panel also provides many visualization parameters for a guide tree.
"Circular tree" tab and "Rectangular tree" tab provide functions listed below.
"Download" tab provides three download links listed below.
After the download of the SVG file, SVG formatted images can be edited and/or converted to other formats (e.g., PDF, PNG and TIFF), by software such as Adobe Illustrator and Inkscape, which is freely available for Windows, macOS, and Linux PC.
Licensing
All data and download files in DiGAlign are freely available under a 'Creative Commons BY-NC-SA 4.0' license.