This is a visualization section of important data on the road to whole genome map.

The outline of the whole project alt text

Last update time:

## [1] "Fri Oct  9 22:26:58 2015"

Original variant calling Figue 1. Variant calling using Freebayes.

On average 869598330/14090421=61.7bp with one variant, which means short reads (100bp & 150bp) are informative for haplotyping.

## Warning: Removed 1130678 rows containing non-finite values (stat_density).

Figue 2. Adjacent SNPs distance distribution.

Only 1130678/14011389=8% observed distances are longer than 150bp.

## [1] "Fri Oct  9 22:27:40 2015"

High correlation (0.975) between “Number of SNPs” and “Scaffold Length” easily exclude the endophyte genome.
What happens to CT and MT genomes?

## Warning: Removed 1 rows containing missing values (geom_point).

Figue 3. Number of SNPs in all Scaffolds.

## [1] "Fri Oct  9 22:27:58 2015"

Figue 4. Sequencing Depth comparation of Version 2 (Gray) and Version 3 (Blue).

Clear 1kb boundary in Version 2 is the contribution of 454 reads.

## Warning: Removed 2978 rows containing non-finite values (stat_density).


Figure 5. Haplotyping percentages of Version 2 and Version 3 in SE & PE mode, respectively.

alt text Figure 6. Types finding and selecting algorithm

Haplotyping Seg Frag Block visualization.

Haplotyping (SP3_scf10000 SNPs: 333 Lenth: 15767)
alt text](/Users//fenhong//Desktop//SP//haplotyping.png) Types merging and extantion (SP3_scf1180 SNPs: 1816 Lenth: 128556) alt text](/Users//fenhong//Desktop//SP//haplotyping2.png) Pair information extantion Haplotyping (SP3_scf11353 SNPs: 4640 Lenth: 334103) alt text](/Users//fenhong//Desktop//SP//haplotyping3.png)

Types finding algorithm (Figure 6) was applied to phase all scaffolds and 3994080 blocks were found and 1858352820 bp haplotype sequences were extracted for following step. Step 0: exten.hap.fasta:4691130 (1969125361 bp) Step 1: bridge.hap.fasta:3994080 (1858352820 bp) Step 2: pair.hap.fasta:2604786 (2811927477 bp)

## Warning: Removed 47944 rows containing non-finite values (stat_density).
## Warning: Removed 50119 rows containing non-finite values (stat_density).
## Warning: Removed 211388 rows containing non-finite values (stat_density).

## Warning: Removed 309451 rows containing non-finite values (stat_ydensity).

Figure 7. Length distribution of haplotype sequences

454 Evaluation