B8114-003: Applied Regression Analysis
T - B Term, 02:15PM to 05:30PM
Instructor: Peter KolesarDownload Syllabus
View course evaluation
This course is about the family of data analysis tools called regression analysis and is a logical successor to the core B6014 Managerial Statistics course. It is frequently taken by 2nd-year MBA students wishing to solidify and extend their quantitative and statistical data analysis skills.
Regression analysis is used to build statistical models of the relationships between variables that can be used for enhanced understanding of the causes of a phenomenon and, when it works, for prediction of future outcomes. In business the ultimate goal of regression analysis is often to support better decision making. In the contemporary world of ‘big data’, regression provides foundational methods and ideas for many of the techniques used in ‘data mining.’
Regressions have been used in financial analyses of investment opportunities, in marketing analyses of customer behavior, in human resources to test the fairness of employment policies, in operations to identify the determinants of product quality, and in strategic planning to create sales forecasts. Regression models are also widely used in many other fields in the sciences, economics and engineering.
Contemporary computing hardware and statistical software has made it extraordinarily easy to mechanically produce regression analyses. For example, Microsoft Excel has a powerful regression tool that is easy to use without knowledge of the underlying concepts or theory. Though it has become child’s play to “run a regression,” it is a challenge to create a regression model that is really useful and reliable. The explicit goal of this course is to learn how to create reliable, valid and useful regressions, and to be able to judge the validity and usefulness of regressions done by others. The course premise is that successful applications of regression require sound understanding of both the practical problem situation, and the underlying statistical theory. The course blends theory and applications -- avoiding the extremes of presenting unneeded theory in isolation, or of giving application tools without the foundation needed for practical understanding.
The course integrates three topics: First and most basic, is an approach to data and data analysis that is based on statistical theory, the scientific method and on some pragmatic epistemology. Second, is regression analysis mechanics and theory, including extensions of the basic linear regression model to logistic regressions, non-linear models and multivariate methods. Third, is forecasting of time series from historical data. The title of our textbook is descriptive of our approach: Regression by Example. Concepts and procedures shall generally be introduced by example. Moreover, we will emphasize applications in which the business context matters.
Computing: The course will be computationally hands-on from the very first lecture. Your laptop computer will be used for all data analysis. Much of the course work, at least at the outset, can be done in Excel and we assume a basic familiarity with its data analysis tools and capabilities. However, there are advantages and conveniences to using a statistical software package. Several important regression procedures -- stepwise regression and logistic regression – cannot be done in Excel, so we will supplement it with the Minitab statistical analysis system. Minitab gives us professional statistical analysis capabilities while being very easy to learn and use. Any version of Minitab that can do regression, stepwise regression and logistic regression will be adequate. Students who already are familiar with another software package that has the aforementioned capabilities are welcome to use it instead. ( e.g. STATA, BMDP, SAS, S4, JMPIN)
Conduct of the Course
Course Project: A major part of the course will be a data analysis project consisting of a significant data analysis in a real business context. I will provide a standard ‘default’ project. However, I strongly suggest that students who have particular interests propose their own project, as this can increase greatly the value you get out of the course. The term project will be an individual effort. Specifications for the final project report and timing will be provided in class.
Workload and Grading: It is expected that students will attend class regularly and participate fully in class discussions. Since many of these discussions will be based on our analytic assignments (mini-cases), it is important that assigned work be done thoroughly and on time. The overall work load should be moderate, but as in any serious learning endeavor, you will get benefit from the course in proportion to what you put in. Assignments can be done individually or in teams of two students. The final course grade will be composed of three components:
Attendance and class participation 1/3
Written Assignments 1/3
Term Project 1/3
Textbooks and Software
The course will follow the same general outline as the text by Chatterjee, Hadi and Price listed below. This book strikes a balance between providing a theoretical understanding and keeping a concrete focus on applications. In class we generally use different examples than those in the text , so it offers a second and complimentary view on most issues and procedures. We recommend purchasing it, however it is possible to do very well in this course without owning the textbook. But it is a good resource and reference , and goes into greater depth than we will have time for in class. There are a number of excellent books on regression and if you already own another, it may suffice.
In addition to Excel we will use the Minitab statistical package, the software for which comes with a helpful user’s manual.
Textbook: Sampit Chatterjee, Ali S. Hadi and Bertram Price, Regression Analysis by Example, 4th edition (Wiley 2006) ISBN 978-0-471-74696-6