This is a visualization section of important data on the road to whole genome map.
The outline of the whole project
Last update time:
## [1] "Fri Oct 9 22:26:58 2015"
Original variant calling Figue 1. Variant calling using Freebayes.
On average 869598330/14090421=61.7bp with one variant, which means short reads (100bp & 150bp) are informative for haplotyping.
## Warning: Removed 1130678 rows containing non-finite values (stat_density).
Figue 2. Adjacent SNPs distance distribution.
Only 1130678/14011389=8% observed distances are longer than 150bp.
## [1] "Fri Oct 9 22:27:40 2015"
High correlation (0.975) between “Number of SNPs” and “Scaffold Length” easily exclude the endophyte genome.
What happens to CT and MT genomes?
## Warning: Removed 1 rows containing missing values (geom_point).
Figue 3. Number of SNPs in all Scaffolds.
## [1] "Fri Oct 9 22:27:58 2015"
Figue 4. Sequencing Depth comparation of Version 2 (Gray) and Version 3 (Blue).
Clear 1kb boundary in Version 2 is the contribution of 454 reads.
## Warning: Removed 2978 rows containing non-finite values (stat_density).
Figure 5. Haplotyping percentages of Version 2 and Version 3 in SE & PE mode, respectively.
Figure 6. Types finding and selecting algorithm
Haplotyping Seg Frag Block visualization.
Haplotyping (SP3_scf10000 SNPs: 333 Lenth: 15767)
alt text](/Users//fenhong//Desktop//SP//haplotyping.png) Types merging and extantion (SP3_scf1180 SNPs: 1816 Lenth: 128556) alt text](/Users//fenhong//Desktop//SP//haplotyping2.png) Pair information extantion Haplotyping (SP3_scf11353 SNPs: 4640 Lenth: 334103) alt text](/Users//fenhong//Desktop//SP//haplotyping3.png)
Types finding algorithm (Figure 6) was applied to phase all scaffolds and 3994080 blocks were found and 1858352820 bp haplotype sequences were extracted for following step. Step 0: exten.hap.fasta:4691130 (1969125361 bp) Step 1: bridge.hap.fasta:3994080 (1858352820 bp) Step 2: pair.hap.fasta:2604786 (2811927477 bp)
## Warning: Removed 47944 rows containing non-finite values (stat_density).
## Warning: Removed 50119 rows containing non-finite values (stat_density).
## Warning: Removed 211388 rows containing non-finite values (stat_density).
## Warning: Removed 309451 rows containing non-finite values (stat_ydensity).
Figure 7. Length distribution of haplotype sequences
454 Evaluation