Digging Deeper in to Supervised Learning Workshop

Thursday, March 12 – 6:00PM-9:00PM

Location: Patterson Hall 224

 

Instructor: Hugh Chipman

Bio: Hugh is a Professor in Acadia's Department of Mathematics and Statistics.  He received his PhD from the University of Waterloo.  He held a Canada Research Chair in Mathematical Modelling at Acadia from 2004-14.  He served as Editor-in-Chief of Technometrics, the top international journal on industrial statistics.  He is a fellow of the American Statistical Association, a recipient of the CRM-SSC prize in statistics, and Associate Director (Atlantic) of the Canadian Statistical Sciences Institute.  His research includes Bayesian methods, statistical learning, decision trees and ensemble models, industrial applications and statistical computing.  He is director of the Acadia Centre for Mathematical Modelling and Computation and a member of AIDA.

Purpose: Research in Statistical and Machine Learning has lead to a variety of algorithms, models and methods for analyzing data.  This session will dig deeper into "supervised learning" methods and provide participants with hands-on experience with the open-source R statistical computing environment.  In supervised learning, data are used to train a model that can predict a response using input variables.  This can include a numeric response (e.g. predict income level, given education, gender, and age) or a categorical response (e.g. predict whether or not a person will click on an online advertisement, given their browsing history).  A key element of virtually every supervised learning problem is effective control of model complexity.  Effective techniques such as cross-validation will be introduced as a way of choosing complexity.  Several important supervised learning tools, including linear regression, decision trees and random forests, will be presented and used in hands-on tutorials.  Participants will be given instructions on installation of the open-source R prior to the session.

Audience: Wide cross-section: students, faculty, staff, business and industry persons. All participants must bring their own laptop, and should be comfortable with using tools such as Excel, web browsers, etc.

Resources/Materials:

- Classroom setting for up to 30 people

- Each participant (or pair of participants) must bring their laptop and be able to connect to the internet.

- All participants must have R + RStudio installed on their laptops BEFORE the session. ---- Note: closer to workshop date, instructions will be provided to participants for installing R on their machines before the session.

- There will be coffee, tea, juice, cookies and fruit available at 5:30pm that will last through to the 7:15pm break. 

 

Schedule:

5:30 – Registration and refreshments

6:00 – Welcome and introductions

6:10 – Simple 1-dimensional example to focus on the bias/variance trade-off (i.e. simplicity vs. complexity)

6:30 – Regression models, including the lasso

7:00 – Tutorial 1 (contrasting KNN with linear and logistic regression)

7:20 – Break

7:40 – Decision tree models

8:00 – Random forests

8:30 – Random forests tutorial

8:50 – Wrap-up

9:00 – Close

 

This workshop costs $15 to cover food and interested people must register by Tuesday March 10th, 2015 @ 4:30PM.


Please register here.

 

Go back