Application of biostatistics-Bioinformatics


Course Coordinator: G. Sakellaropoulos

The purpose of the course is the critical understanding of the fields of Biostatistics and Bioinformatics and their applications. Within the framework of the course, postgraduate students are expected to comprehend basic concepts of probability theory and statistical inference. The course provides a detailed presentation of the processes for formulating null hypotheses, comparing mean values of different samples, types of errors, the power of a statistical test, contingency tables, and the chi-squared test. Emphasis is placed on applying theoretical principles to solve real-world problems. Additionally, the course aims to familiarize students with web-based Bioinformatics applications. Specifically, it introduces biological sequence databases, the concepts and applications of next-generation sequencing, and software for analyzing nucleotide and amino acid sequences. This includes tools for determining protein topology within the cell, analyzing the architectural structure of functional protein regions, identifying protein motifs, studying physicochemical parameters, post-translational modifications, transmembrane regions, secondary and tertiary protein structures, comparing two or more amino acid sequences, and constructing corresponding phylogenetic trees. This is achieved through the presentation of methodologies, discussions on the appropriateness of methods for analyzing data related to specific research topics, and the use of software (SPSS, Excel, and web-based tools) to address real-world problems.


Course Description

The course consists of 2 modules. The first module includes Biostatistics methodologies and the second one Bioinformatics methodologies.

Module 1:

  • Descriptive Statistics (measures of central tendency and dispersion, presentation of data in tables and graphs).
  • Elements of Probability Theory (conditional probability, test sensitivity and specificity, Bayes’ theorem, predictive value, probability distributions).
  • Statistical Sampling (standard error of the mean, central limit theorem).
  • Statistical Inference (formulation of null hypotheses, comparison of mean values from different samples, types of errors, statistical test power, contingency tables, and the chi-squared test).
  • Linear Regression and Correlation (conceptual distinction between them, use of linear regression lines for prediction, confidence interval of the line, linear correlation coefficient).

Module 2:

  • Biological Sequence Databases
  • Searching Methods in Databases
  • Software for Analyzing Nucleotide and Amino Acid Sequences: Determination of protein topology within the cell. Analysis of the architectural structure of functional protein regions. Identification of protein motifs. Analysis of physicochemical parameters, post-translational modifications, transmembrane regions, secondary and tertiary protein structures. Comparison of two or more amino acid sequences and construction of the corresponding phylogenetic tree.
  • Microarrays and Microarray Data Analysis