![]() When we combine the two, we get: CGACAGGTTCAGAGTTCTACAGTCCGACGATC =17 50563 0.1Īdapter 'ATCTCGTATGCCGTCTTCTGCTTG', length 24, was trimmed 1874017 times. It turns out this is part of the Small RNA sequencing primer: >SmallRNASequencingPrimer Now, if you are sharp, you will notice that next to our adapter there is another repetative sequence! TACAGTCCGACG ATCTCGTATGCCGTCTTCTGCTTG Your putty window should look something like this: Zcat | head -n 20000 | grep ATCTCGTATGCCGTCTTCTGCTTG We are not finished yet, and this can get more complicated! One tip is to always ask the scientist who produced the data, but sometimes even they do not know. Detecting the adapters present in our data can take some detection work.FastQC can tell us about adapter contamination, but it may not tell us the correct adapter.So, there are lessons to be learned here: TGGAATTCTCGGGTGCCAAGGAACTCCAGTCACTCGAAG ATCTCGTATGCCGTCTTCTGCTTGĪGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG ATCTCGTATGCCGTCTTCTGCTTG GAACTCCAGTCACATCACG ATCTCGTATGCCGTCTTCTGCTTG ![]() Now, I'm going to let you into a little secret - the adapter that is actually present in this data set is: >SmallRNA3pAdapter_1.5Īnd if we look at again at the reverse complement of the adapters reported by FastQC, we can see: >PCRPrimerIndex1_RC TGGAATTCTCGGGTGCCAAGGAACTCCAGTCACTCGAAGATCTCGTATGCCGTCTTCTGCTTGĪGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG GAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG We must also remember to consider the reverse complement of these sequences as FastQC will look for both! >PCRPrimerIndex1_RC The list that FastQC actually uses is here: /usr/share/java/fastqc-0.10.1/Contaminants/contaminant_list.txt However, I can give the sequences below: >PCRPrimerIndex1ĬAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCĬAAGCAGAAGACGGCATACGAGATCTTCGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCAĬAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTĪs you can see, each of these sequences have a lot in common! To get this letter, you must register at and request it. What are these sequences though? Well, Illumina send these sequences to customers as a letter in PDF format. From the previous section, looking at over-represented sequences, we can see that fastqc thinks the following may be in our dataset:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |