Short Courses
CAC-2018 is pleased to host a wide variety of short courses taught by international experts, with a selection of subjects suitable for both beginners and more advanced researchers.
Courses will be offered on Monday, June 25, 2018 and are divided into half-day sessions. Morning and afternoon sessions can be selected independently, although some afternoon courses are a continuation of the morning course.
Course fees are $250 CAD (regular) and $150 CAD (student) for each half-day.
A summary of the courses is given below, with a detailed description of each course following.
Short Course Schedule (Tentative)
Morning Session (9:00 am - 12:00 pm) | Afternoon Session (1:30 pm - 4:30 pm) | ||
---|---|---|---|
CR1A | Validation of Classification Models: Is your model right? and is it the right model? (Beleites) | CR1P | Tools for Reproducible Research: Advanced Working Techniques in R and RStudio (Beleites) |
CR2A | Advanced Preprocessing for Spectroscopic Applications (Wise, Gallagher) | CR2P | Robust Methods (Wise, Gallagher) |
CR3A | Analysis of Hyperspectral Data: How to maximise spatial and spectral information (Oliveri, Malegori) | CR3P | Advanced Approaches for One-class Modelling (Oliveri, Malegori) |
CR4A | Multivariate Curve Resolution: Basics (Tauler, de Juan, Jaumot) | CR4P | Multivariate Curve Resolution: Advanced (Tauler, de Juan, Jaumot) |
CR5A | The Power of Penalties: Theory and Applications (Eilers) | CR5P | The Power of Penalties: Implementation and Practical Use (Eilers) |
CR6A | Chemometrics and Chromatography: Part 1 (Rutan) | CR6P | Chemometrics and Chromatography: Part 2 (Rutan) |
CR7P | ChemomeTRICKS (Bro) | ||
CR8P | Bayesian Statistics in Chemometrics: Why, when and how? (Vivó-Truyols) |
Short Course Descriptions
Detailed descriptions of short courses are given below.
CR1A: Validation of Classification Models: Is your model right? and is it the right model?In this course, you will learn how to characterize the performance of chemometric models: to ask whether a model is right (verification) but also whether it is the right model (validation). You will learn to choose suitable figures of merit and a plan for obtaining test samples, including calculation of required numbers of test cases and confidence intervals for your figures of merit. We will also discuss techniques to measure model stability and ruggedness and the special case of using performance estimates to tune model hyper-parameters such as complexity. Most of the course applies equally to regression/calibration and classification models, but validation of classifiers has some pitfalls which are less severe for regression/calibration.Instructor: Claudia Beleites (Chemometrics Consulting & Chemometrix GmbH)Session: MorningLocation: TBA |
CR1P: Tools for Reproducible Research: Advanced Working Techniques in R and RStudioR ("Matlab for Statisticians", www.r-project.org) together with the RStudio IDE (www.rstudio.com) offers a versatile environment for chemometric data analysis. This course will introduce you to working techniques that are very handy for efficient every-day chemometric data analysis:
You'll learn how to find/access the code behind calculations as well as debugging techniques that allow you to follow, understand and even adapt calculations. Finally, we'll briefly touch version control and collaborative working techniques, and ways to distribute/publish your code including packaging and Shiny apps.Instructor: Claudia Beleites (Chemometrics Consulting & Chemometrix GmbH)Session: AfternoonLocation: TBA |
CR2A: Advanced Preprocessing for Spectroscopic ApplicationsThe objective of data preprocessing is to remove extraneous variance and anomalies and is often the critical step in development of a successful multivariate calibration or classification scheme. Preprocessing is often the critical step in the development of multivariate regression and classification models. Spectroscopic data poses its own unique problems and also opportunities due to its highly structured nature. The objective of spectroscopic data preprocessing is to maximize signal-to-clutter (S/C) where clutter is defined as extraneous variance and data anomalies that can 'distract' model development. Maximizing S/C is a different paradigm than maximizing signal-to-noise and a firm understanding of the preprocessing algorithms and objectives can lead to more efficient and effective model development. Advanced Preprocessing for Spectroscopic Applications starts with a brief review of basic preprocessing methods to demonstrate how they work within the objective of maximizing S/C and how they can be misused. The course then delves into more advanced topics such as multiplicative scatter correction, extended multiplicative scatter correction and generalized least squares-like weighting. Examples will be focused on spectroscopic applications although many methods are directly extensible to other types of data. The mathematical principles for the preprocessing methods will also be covered. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox and MATLAB.Instructors: Barry Wise, Neal Gallagher (Eigenvector Research Inc.)Session: MorningLocation: TBA |
CR2P: Robust MethodsOutliers are a common problem in industrial data sets. In fact, the presence of outliers is more the norm than the exception. These unusual, often "erroneous" observations heavily affect the classical estimates of data mean, variance and covariance. Without proper treatment, the resulting data models are not an accurate representation of the bulk of the data. Alternately, outlier samples are sometimes the most interesting samples in a data set, revealing unique properties or trends. If these samples are not identified, opportunities for discovery can be missed. Robust Methods deal with the problem of outliers by determining which samples represent the "consensus" in the data and basing the models on those samples, while ignoring the outliers. The course starts with methods for robust estimation of the mean and variance/covariance and go on to methods for robust Principal Components and Partial Least Squares regression. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox and MATLAB.Instructors: Barry Wise, Neal Gallagher (Eigenvector Research Inc.)Session: AfternoonLocation: TBA |
CR3A: Analysis of Hyperspectral Data: How to maximise spatial and spectral informationFor complex data arrays such as hyperspectral images, it is fundamental to deal with methods that allow to exploit the information embodied in the 3D data matrices, not only in terms of spectral features but also considering spatial structures typical for imaging data. Application of unsupervised and supervised chemometric method after unfolding of the data hypercube will be presented, critically discussing advantages and limitations of the approach. More advanced approaches involving the study of spatial features and image texture will be also presented and analysed.Instructors: Paolo Oliveri, Cristina Malegori (University of Genova)Session: MorningLocation: TBA |
CR3P: Advanced Approaches for One-class ModellingQualitative data modelling embraces two main families: discriminant and class-modelling methods (or one-class classifiers). The first strategy is appropriate when at least two classes are meaningfully defined, while the second strategy is the right choice when the focus is on a single class. Although in analytical chemistry most of the issues would be properly addressed by class-modelling strategies, the use of such techniques is rather limited and, in many cases, discriminant methods are forcedly used for one-class problems, introducing a bias in the outcomes. Key aspects related to the development, optimisation and validation of suitable one-class models will be critically analysed and presented.Instructors: Paolo Oliveri, Cristina Malegori (University of Genova)Session: AfternoonLocation: TBA |
CR4A: Multivariate Curve Resolution: BasicsThe course will be a basic introduction to the topic of Multivariate Curve Resolution combining theoretical concepts and hands-on work by using the MCR-Alternating Least Squares (MCR-ALS) algorithm. There will be explanations on how to work with single data sets and multiset structures (formed by several data tables together). Recent variants of MCR incorporating hard-modeling information (e.g., kinetic laws,..) and calibration tasks (correlation constraint) will be briefly described.Practical examples will include analytical data (chromatography, processes,…) and hyperspectral images. Since practical work will be done, bringing a laptop is necessary to follow adequately the course. MATLAB is recommended, but not compulsory. The GUI interface provided can run under MATLAB environment or as a stand-alone in the compiled form.Instructors: Anna de Juan, Romà Tauler, Joaquim Jaumot (University of Barcelona, Spanish National Research Council)Session: MorningLocation: TBA |
CR4P: Multivariate Curve Resolution: AdvancedThis course is addressed to people who have a basic knowledge on MCR or who have completed the morning MCR. Basics course. The focus of this course is on the use of more advanced constraints and onto the application of MCR to more specialized fields, such as LC-MS for big –omic data.Advanced constraints, such as hard-modeling, will be applied in the process analysis context. Model constraints (i.e., trilinearity) will be introduced connected to the study of EEM data and environmental examples.A special attention and time will be devoted to practice the use of a protocol for LC-MS analysis of –omics data that includes a first ROI-based data compression and further resolution by MCR.Since practical work will be done, bringing a laptop is necessary to follow adequately the course. MATLAB is recommended, but not compulsory. The GUI interface provided can run under MATLAB environment or as a stand-alone in the compiled form.Instructors: Anna de Juan, Romà Tauler, Joaquim Jaumot (University of Barcelona, Spanish National Research Council)Session: AfternoonLocation: TBA |
CR5A: The Power of Penalties: Theory and ApplicationsIn classical statistics linear models minimize the sum of squares of the residuals, the differences between observed and fitted values. Unbiasedness of parameters used to be considered a great good. This may be all right for models with small numbers of explanatory variables. But every chemometrician know that it fails completely for larger models, e.g. when spectra, time series or images are used as regressors. In chemometrics biased estimation, in the shape of ridge regression was adopted early and successfully. It is a good example of the use of a penalty, a function of the regression coefficients that is added to sums of squares of residuals. In the case of ridge regression, it is the sum of squares of the coefficients, multiplied by a tunable parameter. We can call it a size penalty. When observations or regression coefficients are ordered, as is often the case, it can be useful to impose smoothness. Examples are trends in time series, baselines in analytic signals and frequency distributions (histograms). To achieve smoothness, roughness penalties are powerful tools. They use the sum of squares of (repeated) differences of neighboring coefficients. I will present a little theory, some visualizations and several applications.Instructor: Paul Eilers (Erasmus University Medical Center)Session: MorningLocation: TBA |
CR5P: The Power of Penalties: Implementation and Practical UseThis is a continuation of the introduction to penalty functions that will emphasize the ease of implementation of these methods for practical applications.Instructor: Paul Eilers (Erasmus University Medical Center)Session: AfternoonLocation: TBA |
CR6A: Chemometrics and Chromatography: Part 1Long analysis times and complex methods are often seen as necessary to achieve sufficient chromatographic resolution to enable quantitation of analytes in mixtures. Chemometric data analysis methods are powerful tools for the enhancement of chromatographic methods. In this course, the use of chemometrics for processing chromatographic data will be highlighted. In this first part of the course, the type of instrumental platforms that result in data suitable for chemometric analysis will be summarized, and the data structure resulting from these platforms will be described. Singular value decomposition is used to help in defining these data structures. Methods for peak detection, peak purity and chromatographic alignment will be described.Instructor: Sarah Rutan (Virginia Commonwealth University)Session: MorningLocation: TBA |
CR6P: Chemometrics and Chromatography: Part 2This is a continuation of the first part of the course that begins by exploring methods for extracting pure variables from the data sets. Following this, the use of multivariate curve resolution-alternating least squares (MCR-ALS) and PARAFAC for curve resolution and quantitative analysis will be discussed, with a focus on liquid chromatography-diode array detection. Specifically, the use of the Barcelona MCR-ALS and the Bro N-way toolboxes will be used for examples, so students can implement these tools for their data after taking the course.Instructor: Sarah Rutan (Virginia Commonwealth University)Session: AfternoonLocation: TBA |
CR7P: ChemomeTRICKSThis course is intended for people with a moderate experience in chemometrics. You will learn about the pitfalls that are common in practical multivariate data analysis. Most importantly, you will learn to avoid them in order to more quickly arrive to valid conclusions with you chemometric models. During the course, examples will be shown on how to build better calibration models, perform variable selection, avoid misinterpreting models, and many other practical aspects.Instructor: Rasmus Bro (University of Copenhagen)Session: AfternoonLocation: TBA |
CR8P: Bayesian Statistics in Chemometrics: Why, when and how?Basically, all classical chemometrics methodology is based in the so-called frequentist statistics. P-values are obtained, for example, to decide whether the Mahalanobis distance is too large for an object and hence should be treated as an outlier in a PCA. Also, regression analysis (either univariate or multivariate) is built on the same foundation. For example, confidence intervals for regression are based on frequentist statistics. However, this foundations fall short in the world of big data, in which the information should be treated as a probability distribution. Bayesian statistics shows an alternative way to a frequentist approach, in which the information from data (and models) can be updated as the data-flow of your system is continuously expanding. This course will teach the basics of Bayesian statistics and the main techniques to be applied with some Matlab code (e.g. marginalization techniques, sampling techniques like MCMC, hierarchical modelling, among others). It will be followed by some (practical) examples about its application in real life. Examples will cover applications in mass spectrometry, chromatography and spectroscopy in diverse areas (forensics, environmental monitoring, food degradation), with emphasis in the comparison (or connection) with classical (multivariate) techniques like PLS, PCA, ridge regression or LDA, among other techniques.Instructor: Gabriel Vivó-Truyols (Tecnometrix; University of Amsterdam)Session: AfternoonLocation: TBA |