Course in generalized linear modeling with biological applications -
Spring 2008
The course is accepted as a PhD-course of Aarhus University (10 ECTS points).
The pages was updated: Jun 18, 2008
News
The course is now booked up. (11. 01. 2007)
The course starts Monday, 25. February 2008, 10:00, at the Reserach Centre Foulum.
- Schedule:
- The course will consist of 4 blocks, the first and third blocks consisting of 3 days.
The dates are
- 25. February - 27. February
-
10. - 11. March
-
02.-04. April
-
14.-15. April
- Venue:
- The course takes place at the Research Centre Foulum, Mødelokale 1.
As a guest from outside please visit first the information at the
main entrance. There you will receive a guest identification card.
Afterwards, turn into the direction of the 'Auditorium'. Pass the auditorium and on the left you find the 'Mødelokale 1'.
- Accommodation:
-
The course is arranged in blocks of 2 to 3 days to facilitate participation
from other locations than Foulum such that people will not have to spend too much time on
transportation and with the only additional expense of having to spend a few
nights in the Foulum area. Accommodation is available at Nørresøkollegiet in
Viborg, see http://www.nkvib.dk/. If participants come from far away, we have
the possibility of not starting until 10am on the first day in a block.
Registration
The course i booked up.
The maximal number of particpants is 15 and the minimal 5.
Course description
The fundamental focus in many experiments and studies is on relating a response
variable to one or several explanatory variables. A traditional way of
accomplishing this is through a multiple linear regression model (technically
speaking, analysis of variance is also a multiple linear regression).
Through practical experience with regression and analysis of variance, one may
have experienced situations where the model assumptions are questionable: Data
might not be normally distributed, for example because the data are counts
(0,1,2,3,4,5,...) or binary (sick/not sick or yes/no). It is not uncommon to
find that the variance of the response variable grows with the expected value,
or the response variable depends on the explanatory variables in a nonlinear
way. Starting from real data examples, it is shown how generalized linear models
(GLM) are used for handling such data. The course also describes how to analyze
such data, when they are correlated, e.g. because the measurements are made on
the same experimental unit. This is achieved using the approach of generalized estimating equations.
The course is planned such that practice and theory goes hand in hand. This
means that the starting point for all topics will be practical examples
primarily, but not exclusively, taken from biological sciences. The necessary
statistical theory is then added as needed to solve the practical problems.
Topics: Linear normal models, logistic regression, analysis of count data,
analysis of data with non-constant variance (in
particular data with constant coefficient of variation), nonlinear relations
between data and explanatory variables, analysis of
correlated data (GEE), the model concept,
statistical inference, model control.
Every lecture will be followed by computer exercises.
For these computer labs the R program will be
used. In the course an introduction to R will be given on the first two days.
Nevertheless, the
participants are strongly recommended
to download, install and start playing
around with R before the course starts.
Facilitites
The lecture room is equipped with computers with internet-access and will be used during the
practicals.
Prerequisites
Working knowledge of basic mathematical and statistical tools and concepts:
Solving a simple equation, logarithmic and exponential function. Probability
distribution, random variable, mean, variance, normal distribution, confidence
interval, linear regression, analysis of variance, hypothesis testing. If you
are uncertain about whether you meet these requirements, please contact the
lecturer!
It may be advisable to brush-up your statistical skills before the start of the
course. We suggest to consult e.g.
- Blæsild, P, and Granfeldt, J. (2003) Statistics with Applications in
Biology and Geology, Chapman and Hall/CRC : London .
- Zar, J. H. (1999) Biostatistical Analysis, Prentice Hall
Additional information
- Language:
- The course language will be English.
- On the web:
- The course homepage is
http://genetics.agrsci.dk/statistics/courses/phd08
Homepage of the previous
course in 2007
- Form:
- The course will consist of a mixture of lectures, exercises, and computer
practicals.
- Credit:
- The course is approved as a PhD course at Aarhus University with 10 ECTS points.
- Workload:
- To complete this course you should expect to put about 7 weeks
of full time work into it.
- Compulsory homework:
-
A very important part of the course is the take-home assignments. These are
larger assignments which must be handed in and approved. Participants can only attend the
exam if the take-home assignments have been approved.
- Exam:
- A project has to be made at the end of the course. The final (oral)
exam is based on that project, but a participant can only attend the exam if the
take-home assignments have been approved.
- Price:
- The course is free for PhD students.
It is also free for students and employees affiliated to Aarhus University.
Other participants will
have to pay 12.000 DKK for participation.
- Lecturer:
-
Course program and course material
The data sets used in the course are installed to R by executing in R the
command
install.packages("dataRep",repos="http://gbi.agrsci.dk/statistics/software/r/packages")
In the software folder you can find the setup file for the editor Tinn-R (1.19.3.1).
The homepage for Tinn-R is http://www.sciviews.org/Tinn-R/.
The material for the present course is available (exception Day 10).
You may download the files.
The following files have the same content but in different format:
- LECTURE.pdf: The lecture notes in normal format.
-
LECTURE-slide.pdf: The lecture notes in slide format as used in the lecture.
These are probably too large for printing.
-
LECTURE-handout.pdf: The lecture notes as slides but in more printer friendly form.
- DAY Click here to find material for this day
- Introduction to R: Introduction the use of the
statistical programming environment R.
We download and install R, perform basic data analytic and graphical tasks.
- DAY Click here to find material for this day
- Linear normal models (LNM).
Regression modeling based on the normal distribution: We recap
what is assumed known, but put it in different form.
- Practical exercises on LNM in R
- DAY Click here to find material for this day
- DAY Click here to find material for this day
- Introduction to Binomial data
- Principles of inference
- DAY Click here to find material for this day
- DAY Click here to find material for this day
- Poisson Regression
- Gamma distributed data
- DAY Click here to find material for this day
- Generalized Linear Model
- Model Selection
- DAY Click here to find material for this day
- DAY Click here to find material for this day
- Overdispersion
- Quasi-Likelihood
- DAY Click here to find material for this day
- Correlated data - Generalized Estimating Equations (GEE)
- Final-Exam Click here to find material for this day
- Examination - The tasks and the dates.
- HOMEWORKS Click here to find material for this day
- Homworks to be delivered by e-mail on Wednesday, 05. March, 26. March and 10. April.
- extra material Supplementary material
- Some supplementary material
Homework:
- After day 3: Homework on linear normal models
- After day 5: Homework on logistic regression
- After day 8: Homework on Poisson regression
Literature
- Notes and slides prepared by the teachers.
- Dalgaard, P (2002) Introductory Statistics With R, Springer
Verlag. (You are expected to acquire this book prior to the course start).
-
Faraway, Julian J. (2006) Extending the Linear Model with R,
Chapman & Hall, London, ISBN 1-58488-424-X
In addition we suggest consulting:
- Blæsild, PPP. and Granfeldt, J. (2003) Statistics with Applications in
Biology and Geology, Chapman and Hall/CRC : London (Chapters 9, 8 and 4 are especially
relevant for this course, and it is a very good book in general).
-
Aitkin, M., Francis B. and Hinde, J. (2004).
Statistical Modelling in GLIM4. 2nd edition, Oxford University Press: Oxford.
- Dobson, A.J. (2002). An Introduction to
Generalized Linear Models. 2nd edition, Chapman and Hall.
- Lindsey, J. K. (1997) Applying Generalized Linear Models,
Springer Verlag: Heidelberg.
- McCullagh, P. og Nelder, J.A. (1989). Generalized Linear Models.
Second Edition, Chapman and Hall: London
- Myers R. H., Montgomery, D.C and Vining, G.G (2004)
Generalized Linear Models: with Application in Engineering and Science.
John Wiley & Sons, New York
Useful Links
File translated from
TEX
by
TTH,
version 3.79.
On 18 Jun 2008, 16:36.