Free Population Genomics Training in Sweden November 2017!

  • Posted on: 29 June 2017
  • By: nblouin

An introduction to bioinformatic tools for population genomic data analysis

November 6-10 2017

Sven Lovén Centre for Marine Sciences on the island of Tjärnö outside of Strömstad on the Swedish West Coast

https://sites.google.com/view/bioinformaticpipelines2017

 

This course aims at detailed understanding and hands-on experience of
using state of the art bioinformatics pipelines for one"s own biological
research questions. An important aspect of the course is to show how
genomic data can be applied to address and answer research questions
in the fields of genetics, ecology, population biology, biodiversity
monitoring and conservation. The students will be trained in the latest
bioinformatic methods to analyze high throughput sequencing data,
which is present in many research projects. The course will cover basic
computing tools required to run command line applications, processing
high throughput sequencing data of whole genome / exome / restriction site
digested (RAD) DNA for population genomic studies.

The first part of the course introduces general computing tools for
beginners such as the UNIX command line environment, bash commands, data
formatting using regular expressions and basic scripting in the unix
shell with a series of examples and exercises using a remote server.
The course introduces bioinformatics software for analysis of RAD-data, and
downstream population genetic analysis of genotype data.
The course also introduces basic and advanced concepts of
population genomics data analysis such as genome/transcriptome assembly,
alignment/mapping, differential Gene expression,
functional enrichment tests, SNP genotyping, PCA, outlier tests.
The course corresponds to 1 week of full time studies and and is composed
of lectures, demonstrations and computer labs.

2. Outcomes

1. Knowledge and understanding
1a. Demonstrate advanced knowledge of experimental strategies,
applications and bioinformatic tools for population genomics.
1b. Demonstrate advanced knowledge of the potential of genomics
approaches to answer ecosystem-wide questions, in particular for
biodiversity monitoring.

2. Skills and abilities
2a. Ability to use basic commands in the Unix command line environment
(reformatting data with regular expressions, basic scripting, running
python scripts from the unix shell)

2b. Ability to use different software tools to analyse sequence data
from restriction-site digested DNA (data cleaning steps, clustering of reads,
mapping to reference genomes, extracting and filtering genotype data.

2c. Ability to use population genomics software tools to assemble and
a genome/transcriptome, and perform gene alignment/mapping,
differential gene expression, functional enrichment tests, SNP genotyping,
PCA, outlier tests.

3. Judgement and approach
3a. Formulate one's own research questions, identify data and tools needed
to answer these questions and critically evaluate and analyse the results.

4. Required reading

Part 1: General computing tools.
This will be the main textbook for the introduction to general computing
tools:
- Haddock and Dunn (2010). Practical computing for Biologists. Sinauer
Associates.

Part 2: RAD data analysis.
- Wang et al. (2012). 2b-RAD: a simple and flexible method for
genome-wide genotyping. Nature Methods 9, 808-810.
- Davey et al. (2011). Genome-wide genetic marker discovery and genotyping
using next-generation sequencing. Nature Reviews Genetics 12, 499-510.

Part 3: Population transcriptomics
- De Wit et al. (2012). The simple fool's guide to population genomics
via RNA-seq: an introduction to high-throughput sequencing data
analysis. Molecular Ecology Resources 12, 1058-1067.

Online course material
- The simple fool's guide to population genomics via RNA-Seq: an
introduction to high-throughput sequencing data analysis. Details of
the pipeline can be found at (http://sfg.stanford.edu) Practical
computing for Biologists (http://practicalcomputing.org)
- Github repositories:
https://github.com/z0on/2bRAD_GATK
https://github.com/DeWitP/Bioinformatic_Pipelines
https://github.com/The-Bioinformatics-Group/Learning_Unix

5. Assessment
Attendence is mandatory for a pass grade.

6. Grading scale
The grading scale comprises Fail (U), and Pass (G).

6. Course evaluation
The course evaluation will be carried out through an online
questionnaire.

Additional information
Language of instruction will be English, as international guest lecturers
will participate.

11. Preliminary course schedule
Course format: 2.5 hp course, fulltime.
Lecturers: Pierre De Wit (PDW), Mats Töpel (MT), Hernan Morales (HM),
Tomas Larsson (TL) and Mikhail Matz (MM)

* Day 1: Introduction to general computing tools (PDW, MT, HM)
Format: 3 hours lecture and demo sessions, 3 hours computer labs,
assigned exercises

Day 1 of the course will be an introduction to general computing tools,
such as the unix command line environment. We will go through bash
commands (less, nano, ls, ll, wc, |, tail, head, mkdir, cat, grep,
for loop), regular expressions, basic scripting, and running python
scripts from the unix shell with a series of examples. Exercises and
assignments will be based on Haddock and Dunn "Practical Computing for
biologists" and can be carried out independently. There will also be
a presentation of useful bioinformatics software.

Part of day 1 will also be concerned with working on a remote server,
using the University of Gothenburg"s Albiorix cluster as a training
tool. Sudents will be provided with guest accounts. This portion of
the course is to refresh the students" knowledge of the command line
environment and the shell, a tool for interacting with the computer
through typed instructions at the command line. Exercise sessions will
be carried out in pairs to encourage collaborative problem solving. All
lectures will be made dynamic through live demonstrations of the command
line. Detailed course material including commands and scripts will
be available through the course web page. This part of the course
corresponds to learning outcome 2a: "Ability to use basic commands
in the Unix command line environment" (reformatting data with regular
expressions, basic scripting, running python scripts from the unix shell)

 

* Days 2 and 3: Population Genomics pipeline using whole genome/exome
data (PDW, HM)

Format: Lectures and hands on sessions (computer labs)

This part of the course will cover an introductory lecture
and practical session run through of the simple fool's guide to
population genomics via RNA-Seq: an introduction to high-throughput
sequencing data analysis. Details of the pipeline can be found at
(http://sfg.stanford.edu). There will also be a demonstration of tools
for whole genome assembly (from PacBio and Illumina data).

Lecture: Assembling genomes
Practical session: Genome assembly using Falcon (PacBio data) and SOAP
(Illumina data)

Lecture: Population genomics via RNA-seq: an introduction to
high-throughput sequencing data analysis (PDW)

Practical session: A hands on practical session based on the which
will cover:
Genome/Transcriptome assembly, annotation (BLAST), alignment/mapping,
differential Gene expression, functional enrichment tests, SNP genotyping,
PCA, outlier tests.

The practical session will encourage students to collaborate and work
in pairs to enhance communication and understanding.

Before the course: Students will need to bring their own computers,
with some software pre-installed. Guidelines and download locations for
all software used in the course will be available on the course web page.

This part of the course corresponds to learning outcome 2c: "Ability
to use population genomics software tools to assemble and annotate a
genome/transcriptome, and perform gene alignment/mapping, differential
gene expression, functional enrichment tests, SNP genotyping, PCA,
outlier tests."

* Days 4 and 5: Analysis of RAD-data and population genomic analyses
using AFS-based methods (Dadi / Moments) for genotype data.
Format: Lectures and live demonstrations of software

This part of the course corresponds to learning outcome 2b: "Ability to use
different software tools to analyse sequence data from restriction-site
digested DNA (data cleaning steps, clustering of reads, mapping to reference
genomes, extracting and filtering genotype data".