Next-Generation Sequencing (NGS) data analysis for beginners

Earlier research provided the biological base as genetics, molecular biology, and biochemistry. Molecular biology explained the central dogma which itself defines the biological process from DNA to mRNA and mRNA to protein. These proteins take part in various pathways. In 1980s, first genomic sequencing was time consuming and required more labor. Commercial launch of pyro sequencing in 2005 made a new path in genomics as Next-Generation Sequencing (NGS) era. Now researchers can perform fast, accurate sequencing with short DNA/RNA fragments. High through put sequencing is evolving day by day due to low cost and huge data generation. Third generation sequencing transforms huge biological data to information at cell level via platform of computational biology. Computational biology applies three step analysis data filtering, data processing and statistical error correction using various servers. The human genome project provided the umbrella for reference-based genomics. This reference based genomics approach covers gene expression analysis and followed with gene of interest towards the protein structure analysis. Gene Expression analysis has an application in various fields like to identify the gene for disease resistance in salt/drought tolerance in plants, Medical sciences, diagnostics etc. De-novo based genomics is another aspect to perform work on new species and identify the genes expression in particular tissue. Computational biology is based on various online servers, databases and tools. This training will enlighten genomics and transcriptomics path of biology to young researchers. The young researcher will be capable to read high throughput sequencing data, access of public repositories specific for sequencing data. Our young researchers know about central dogma, but this training make them aware about backend process.

Course highlights

  • Basics of RNA sequencing via NGS approach
  • Biological databases of NGS and data types-file types
  • Data Retrieval from archives and public repositories
  • Basic information on file formats and conversion
  • Quality check and filtering
  • Read alignment to the reference genome
  • Annotations related to various diseases (same thing)

Day - 1 Introduction to RNA Sequencing Techniques: Sanger Sequencing, Microarray, Next generation sequencing, third generation sequencing techniques, Sequencing Data formats and Databases, Sequencing data generated in chromatogram, fastq, Databases: SRA, TCGA, ENSEMBL, EMBL, Genbank, GEO db.

Day - 2 Data Retrieval: Retrieval of various datasets as filtered fastq, fastq, fasta files, and microarray files. Data type conversion.

Day - 3 Analysis Tool: Detail discussion about tools and parameters relate to Galaxy server, fastqc tool, trimmomatic for quality check of downloaded and sequencing data. Quality check result interpretations.

Day - 4 Reference Mapping and annotation: Reference mapping tools like bowtie, tophat against reference genomes with detailed parameters and output interpretations. Gene ontology using reference mapping data through samtools

Programme details

Duration of training : 4 Days (15th - 23rd January 2022) Saturday and Sunday

Timings : 03:00 PM to 6:00 PM

Eligibility criteria : B.Sc. / B.Tech. / M.Tech./ M.Sc. /Ph.D.

Fee : Rs.3500/- (Including GST)


Dr. Neha Goel

Dr. Shiv Bharadwaj


Participants successfully completing the training program will be provided with certificate.


The entire training will be online and shall have following prerequisites:

  • Participants must have access to the laptop/desktop with stable internet connection.
  • Basic knowledge of computer will be beneficial.