3 Basic functionalities
3.1 Sequence graph
DiProGB encodes the sequence information by dinucleotide properties. A sequence of length L is described by L-1 dinucleotides (L, for a circular nucleotide sequence). Dinucleotide property values (see, 1.3 Dinucleotide properties) are assigned to each of the L-1 dinuclotides and these values are plotted as a graph with the sequence position as x-axis and the corresponding property value as y-axis. For some analyses it is useful to smoothen the graph. This is done by applying a shifting window (SW) technique using a window of user-defined odd size. A window of size S is shifted through the sequence starting at the first position and then proceeding with positions 2, 3, …. For each window the mean over all dinculeotide property values is calculated and assigned to the middle sequence position. For example, in the case of S=101 this is position 51 for a window starting at sequence position 1. For linear genomes the SW size Sis gradually decreased at the ends of the sequence (start and end regions < S/2). This ensures also for linear genomes a correspondingly averaged property value for each of the L-1 bases (excluding the last nucleotide), although the extent of smoothing is thus decreased at the ends, of course. To indicate the decreased smoothing at the ends the corresponding values are connected by dotted lines. If L>N (the number of pixels available, i.e. width of the main window), DiProGB automatically calculates the smallest possible S: S=2 *RoundDown[(L-1)/N] – 1. The corresponding numbers are then calculated for each (L-1)/N-th position in the genome. Hence, even though it is not possible to present information for each individual dinucleotide, the complete sequence information is covered by this averaging procedure. To get more detailed information one can zoom into the sequence.
In addition one or multiple subsequence graphs encoded by other dinucleotide properties for the same sequence can be displayed above the main sequence graph. This allows to visualize several parameters at once and compare them. The selection of the subgraphs is done by the check boxes in front of the dinucleotide properties in the DiPro list (cf. 2.2 Sequences, Dinucleotide properties). All subgraphs are calculated using the same algorithms as for the main graph and can also be manipulated in real time.
3.2 Feature graph
Below the sequence graph there is a second graphical representation that we call a feature graph. The feature graph displays all features defined in the sequence file (GenBank or feature file) as horizontal bars. When loading a genome into DiProGB the user can select the features to be displayed. The features can also be selected afterwards by Display->Color list->Restrict features/qualifiers. The color coding in the feature graph is the same as in the sequence graph. To cover all information of the often overlapping annotated features stacked bars are shown with the shortest feature on top. In the sequence graph only one color can be displayed for a specific position. In this case only the shortest feature is displayed and the others are hidden. For instance, exons and introns are displayed instead of the complete gene. In contrast, the feature graph provides a complete overview of all features. More information is available by the List features option in the right mouse button popup menu. It shows all annotated information for any given position. The feature graph can be disabled (enabled) by Display->Graph features and Show feature graph.
3.3 Graph coloring
The graph can be colored according to annotated
features and qualifiers. For the following set of frequently used
features the colors are predefined:
- source - black
- CDS - orange
- gene - red
- rRNA - green
- tRNA - blue
For the other features (or qualifiers) the colors are set to be clearly distinguishable. All colors (also the predefined ones) can be changed by the user. To color the features according to the content of a specific qualifier the user can change the color scheme in the color list. After changing the color scheme to “/product”, for example, all gene products are colored differently in the sequence graph.
3.4 Letter-based sequence information
It is often useful to not just see the sequence graph but to also get the underlying sequence itself. This can be done with the right mouse button popup menu Get sequence option. A region of interest can be selected and the corresponding nucleotide sequence (and information about average A,T,G,C content) is shown in an extra window. Then the sequence can be saved or directly used for motif search.If one zoomes deep enough into the sequence, it is directly shown below the graph.
3.5 Manipulating the sequence graph
The three most important options to change the graph picture are:
- Zooming
- Changing the amplitude
- Smoothing
These options are available by either
the right mouse button or the Graphics controls button(2). All three graph manipulations
can be performed by scrolling the mouse wheel.
1. The easiest way of zooming is to mark the correponding part of
the graph using the left mouse button. A second way to display just a
specified interval is to enter the first, last or middle position of
the interval into the three corresponding text fields below the
sequence graph (12,13,14). The currently shown part of the whole graph
is indicated as blue bar in the horizontal scroll bar (16). A blue bar
over the full length of the horizontal scroll bar indicates that complete sequence is displayed. If just a sub-interval is selected
it can be moved to the left or right by either clicking the "Horizontal
shift buttons" (11) or by clicking directly into the horizontal scroll
bar (16). It is also possible to zoom into the sequence graph using the mouse wheel with the right mouse button popup
menu option Zoom.
2. As default the maximum amplitude (difference
between the highest and lowest peak of the sequence graph) is set to
25% of the main window height. The amplitude can be controlled in two
ways: first by entering the corresponding number intoGraphics controls->Y-axis scaling
and secondly, with the right mouse button popup menu option
Y-scaling(by using the mouse wheel).
3. The graph can be smoothed using a shifting window (SW) of odd size (cf. 3.1). By default, the SW size is set to
display the maximum possible information (cf. 3.1). For further smoothing the graph the SW size can be altered in
Graphics controls.
If the desired size is too small it is automatically increased to the
possible minimum (cf. 3.1). By zooming in or out the SW size is, by
default, set to the minimum possible size for the given zoom level. To
fix the SW size one can check the fix box in Graphics controls. One can also change the SW size with the mouse wheel by
using the right mouse button popup menu option SW size. Note that the shifting window size is indicated as a grey line above
the mouse cursor (can only be seen if the SW size is big enough).