Short Courses


CAC-2018 is pleased to host a wide variety of short courses taught by international experts, with a selection of subjects suitable for both beginners and more advanced researchers.

Courses will be offered on Monday, June 25, 2018 and are divided into half-day sessions.  Morning and afternoon sessions can be selected independently, although some afternoon courses are a continuation of the morning course.

Course fees are $250 CAD (regular) and $150 CAD (student) for each half-day.

A summary of the courses is given below, with a detailed description of each course following.

Short Course Schedule (Tentative)

Morning Session (9:00 am - 12:00 pm) Afternoon Session (1:30 pm - 4:30 pm)
CR1A Validation of Classification Models: Is your model right? and is it the right model? (Beleites) CR1P Tools for Reproducible Research: Advanced Working Techniques in R and RStudio (Beleites)
CR2A Advanced Preprocessing for Spectroscopic Applications (Wise, Gallagher) CR2P Robust Methods (Wise, Gallagher)
CR3A Analysis of Hyperspectral Data: How to maximise spatial and spectral information (Oliveri, Malegori) CR3P Advanced Approaches for One-class Modelling (Oliveri, Malegori)
CR4A Multivariate Curve Resolution: Basics (Tauler, de Juan, Jaumot) CR4P Multivariate Curve Resolution: Advanced (Tauler, de Juan, Jaumot)
CR5A The Power of Penalties: Theory and Applications (Eilers) CR5P The Power of Penalties: Implementation and Practical Use (Eilers)
CR6A Chemometrics and Chromatography: Part 1 (Rutan) CR6P Chemometrics and Chromatography: Part 2 (Rutan)
CR7A Bayesian Statistics in Chemometrics: Why, when and how? (Part 1: Theory) (Vivó-Truyols) CR7P Bayesian Statistics in Chemometrics: Why, when and how? (Part 2: Applications) (Vivó-Truyols)
    CR8P ChemomeTRICKS (Bro)

Short Course Descriptions

Detailed descriptions of short courses are given below.

Beleites

CR1A: Validation of Classification Models: Is your model right? and is it the right model?

In this course, you will learn how to characterize the performance of chemometric models: to ask whether a model is right (verification) but also whether it is the right model (validation). You will learn to choose suitable figures of merit and a plan for obtaining test samples, including calculation of required numbers of test cases and confidence intervals for your figures of merit. We will also discuss techniques to measure model stability and ruggedness and the special case of using performance estimates to tune model hyper-parameters such as complexity. Most of the course applies equally to regression/calibration and classification models, but validation of classifiers has some pitfalls which are less severe for regression/calibration.

Instructor: Claudia Beleites (Chemometrics Consulting & Chemometrix GmbH)

Session: Morning

Location: TBA

CR1P: Tools for Reproducible Research: Advanced Working Techniques in R and RStudio

R ("Matlab for Statisticians", www.r-project.org) together with the RStudio IDE (www.rstudio.com) offers a versatile environment for chemometric data analysis. This course will introduce you to working techniques that are very handy for efficient every-day chemometric data analysis:

  • knitr documents are "self-calculating" reports that allow you to formulate your decisions, conclusions and interpretations directly alongside the respective calculations and plots. No more worries whether the diagram in the manuscript is outdated or whether there are typos in the resulting numbers!
  • Piping operators allow elegant, readable and concise formulation of stepwise calculations
  • Debugging: one particular advantage of R is that the R code of the functions is always available - and can even be adapted at runtime.

You'll learn how to find/access the code behind calculations as well as debugging techniques that allow you to follow, understand and even adapt calculations. Finally, we'll briefly touch version control and collaborative working techniques, and ways to distribute/publish your code including packaging and Shiny apps.

Instructor: Claudia Beleites (Chemometrics Consulting & Chemometrix GmbH)

Session: Afternoon

Location: TBA

Wise/Gallagher

CR2A: Advanced Preprocessing for Spectroscopic Applications

The objective of data preprocessing is to remove extraneous variance and anomalies and is often the critical step in development of a successful multivariate calibration or classification scheme. Preprocessing is often the critical step in the development of multivariate regression and classification models. Spectroscopic data poses its own unique problems and also opportunities due to its highly structured nature. The objective of spectroscopic data preprocessing is to maximize signal-to-clutter (S/C) where clutter is defined as extraneous variance and data anomalies that can 'distract' model development. Maximizing S/C is a different paradigm than maximizing signal-to-noise and a firm understanding of the preprocessing algorithms and objectives can lead to more efficient and effective model development. Advanced Preprocessing for Spectroscopic Applications starts with a brief review of basic preprocessing methods to demonstrate how they work within the objective of maximizing S/C and how they can be misused. The course then delves into more advanced topics such as multiplicative scatter correction, extended multiplicative scatter correction and generalized least squares-like weighting. Examples will be focused on spectroscopic applications although many methods are directly extensible to other types of data. The mathematical principles for the preprocessing methods will also be covered. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox and MATLAB.

Instructors: Barry Wise, Neal Gallagher (Eigenvector Research Inc.)

Session: Morning

Location: TBA

CR2P: Robust Methods

Outliers are a common problem in industrial data sets. In fact, the presence of outliers is more the norm than the exception. These unusual, often "erroneous" observations heavily affect the classical estimates of data mean, variance and covariance. Without proper treatment, the resulting data models are not an accurate representation of the bulk of the data. Alternately, outlier samples are sometimes the most interesting samples in a data set, revealing unique properties or trends. If these samples are not identified, opportunities for discovery can be missed. Robust Methods deal with the problem of outliers by determining which samples represent the "consensus" in the data and basing the models on those samples, while ignoring the outliers. The course starts with methods for robust estimation of the mean and variance/covariance and go on to methods for robust Principal Components and Partial Least Squares regression. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox and MATLAB.

Instructors: Barry Wise, Neal Gallagher (Eigenvector Research Inc.)

Session: Afternoon

Location: TBA

Oliveri/Malegori

CR3A: Analysis of Hyperspectral Data: How to maximise spatial and spectral information

For complex data arrays such as hyperspectral images, it is fundamental to deal with methods that allow to exploit the information embodied in the 3D data matrices, not only in terms of spectral features but also considering spatial structures typical for imaging data. Application of unsupervised and supervised chemometric method after unfolding of the data hypercube will be presented, critically discussing advantages and limitations of the approach. More advanced approaches involving the study of spatial features and image texture will be also presented and analysed.

Instructors: Paolo Oliveri, Cristina Malegori (University of Genova)

Session: Morning

Location: TBA

CR3P: Advanced Approaches for One-class Modelling

Qualitative data modelling embraces two main families: discriminant and class-modelling methods (or one-class classifiers). The first strategy is appropriate when at least two classes are meaningfully defined, while the second strategy is the right choice when the focus is on a single class. Although in analytical chemistry most of the issues would be properly addressed by class-modelling strategies, the use of such techniques is rather limited and, in many cases, discriminant methods are forcedly used for one-class problems, introducing a bias in the outcomes. Key aspects related to the development, optimisation and validation of suitable one-class models will be critically analysed and presented.

Instructors: Paolo Oliveri, Cristina Malegori (University of Genova)

Session: Afternoon

Location: TBA

deJuan/Tauler/Jaumot

CR4A: Multivariate Curve Resolution: Basics

The course will be a basic introduction to the topic of Multivariate Curve Resolution combining theoretical concepts and hands-on work by using the MCR-Alternating Least Squares (MCR-ALS) algorithm. There will be explanations on how to work with single data sets and multiset structures (formed by several data tables together). Recent variants of MCR incorporating hard-modeling information (e.g., kinetic laws,..) and calibration tasks (correlation constraint) will be briefly described.

Practical examples will include analytical data (chromatography, processes,…) and hyperspectral images. Since practical work will be done, bringing a laptop is necessary to follow adequately the course. MATLAB is recommended, but not compulsory. The GUI interface provided can run under MATLAB environment or as a stand-alone in the compiled form.

Instructors: Anna de Juan, Romà Tauler, Joaquim Jaumot (University of Barcelona, Spanish National Research Council)

Session: Morning

Location: TBA

CR4P: Multivariate Curve Resolution: Advanced

This course is addressed to people who have a basic knowledge on MCR or who have completed the morning MCR. Basics course. The focus of this course is on the use of more advanced constraints and onto the application of MCR to more specialized fields, such as LC-MS for big –omic data.

Advanced constraints, such as hard-modeling, will be applied in the process analysis context. Model constraints (i.e., trilinearity) will be introduced connected to the study of EEM data and environmental examples.

A special attention and time will be devoted to practice the use of a protocol for LC-MS analysis of –omics data that includes a first ROI-based data compression and further resolution by MCR.

Since practical work will be done, bringing a laptop is necessary to follow adequately the course. MATLAB is recommended, but not compulsory. The GUI interface provided can run under MATLAB environment or as a stand-alone in the compiled form.

Instructors: Anna de Juan, Romà Tauler, Joaquim Jaumot (University of Barcelona, Spanish National Research Council)

Session: Afternoon

Location: TBA

Eilers

CR5A: The Power of Penalties: Theory and Applications

In classical statistics linear models minimize the sum of squares of the residuals, the differences between observed and fitted values. Unbiasedness of parameters used to be considered a great good. This may be all right for models with small numbers of explanatory variables. But every chemometrician know that it fails completely for larger models, e.g. when spectra, time series or images are used as regressors. In chemometrics biased estimation, in the shape of ridge regression was adopted early and successfully. It is a good example of the use of a penalty, a function of the regression coefficients that is added to sums of squares of residuals. In the case of ridge regression, it is the sum of squares of the coefficients, multiplied by a tunable parameter. We can call it a size penalty. When observations or regression coefficients are ordered, as is often the case, it can be useful to impose smoothness. Examples are trends in time series, baselines in analytic signals and frequency distributions (histograms). To achieve smoothness, roughness penalties are powerful tools. They use the sum of squares of (repeated) differences of neighboring coefficients. I will present a little theory, some visualizations and several applications.

Instructor: Paul Eilers (Erasmus University Medical Center)

Session: Morning

Location: TBA

CR5P: The Power of Penalties: Implementation and Practical Use

This is a continuation of the introduction to penalty functions that will emphasize the ease of implementation of these methods for practical applications.

Instructor: Paul Eilers (Erasmus University Medical Center)

Session: Afternoon

Location: TBA

Rutan

CR6A: Chemometrics and Chromatography: Part 1

Long analysis times and complex methods are often seen as necessary to achieve sufficient chromatographic resolution to enable quantitation of analytes in mixtures. Chemometric data analysis methods are powerful tools for the enhancement of chromatographic methods. In this course, the use of chemometrics for processing chromatographic data will be highlighted. In this first part of the course, the type of instrumental platforms that result in data suitable for chemometric analysis will be summarized, and the data structure resulting from these platforms will be described. Singular value decomposition is used to help in defining these data structures. Methods for peak detection, peak purity and chromatographic alignment will be described.

Instructor: Sarah Rutan (Virginia Commonwealth University)

Session: Morning

Location: TBA

CR6P: Chemometrics and Chromatography: Part 2

This is a continuation of the first part of the course that begins by exploring methods for extracting pure variables from the data sets. Following this, the use of multivariate curve resolution-alternating least squares (MCR-ALS) and PARAFAC for curve resolution and quantitative analysis will be discussed, with a focus on liquid chromatography-diode array detection. Specifically, the use of the Barcelona MCR-ALS and the Bro N-way toolboxes will be used for examples, so students can implement these tools for their data after taking the course.

Instructor: Sarah Rutan (Virginia Commonwealth University)

Session: Afternoon

Location: TBA

Vivo-Truyols

CR7A: Bayesian Statistics in Chemometrics: Why, when and how? (Part 1: Theory)

Basically, all classical chemometrics methodology is based in the so-called frequentist statistics. P-values are obtained, for example, to decide whether the Mahalanobis distance is too large for an object and hence should be treated as an outlier in a PCA. Also, regression analysis (either univariate or multivariate) is built on the same foundation. For example, confidence intervals for regression are based on frequentist statistics. However, this foundations fall short in the world of big data, in which the information should be treated as a probability distribution. Bayesian statistics shows an alternative way to a frequentist approach, in which the information from data (and models) can be updated as the data-flow of your system is continuously expanding. The first part of this course will teach the basics of Bayesian statistics and the main techniques to be applied with some Matlab code (e.g. marginalization techniques, sampling techniques like MCMC, hierarchical modelling, among others).

Instructor: Gabriel Vivó-Truyols (Tecnometrix; University of Amsterdam)

Session: Morning

Location: TBA

CR7P: Bayesian Statistics in Chemometrics: Why, when and how? (Part 2: Applications)

In this second part some (practical) examples about its application in real life will be discussed. Examples will cover applications in mass spectrometry, chromatography and spectroscopy in diverse areas (forensics, environmental monitoring, food degradation), with emphasis in the comparison (or connection) with classical (multivariate) techniques like PLS, PCA, ridge regression or LDA, among other techniques.

Instructor: Gabriel Vivó-Truyols (Tecnometrix; University of Amsterdam)

Session: Afternoon

Location: TBA

Bro

CR8P: ChemomeTRICKS

This course is intended for people with a moderate experience in chemometrics. You will learn about the pitfalls that are common in practical multivariate data analysis. Most importantly, you will learn to avoid them in order to more quickly arrive to valid conclusions with you chemometric models. During the course, examples will be shown on how to build better calibration models, perform variable selection, avoid misinterpreting models, and many other practical aspects.

Instructor: Rasmus Bro (University of Copenhagen)

Session: Afternoon

Location: TBA