Advanced - The Institute for Statistics and Data Science (ISADS)

Home » Advanced

Modeling with tidymodels in R

Tidymodels is a powerful suite of R packages designed to streamline machine learning workflows. Learn to split datasets for cross-validation, preprocess data with tidymodels’ recipe package, and fine-tune machine learning algorithms. You’ll learn key concepts such as defining model objects and creating modeling workflows. Then, you’ll apply your skills to predict home prices and classify employees by their risk of leaving a company.

Machine Learning with tidymodels

In this topic, you’ll explore the rich ecosystem of R packages that power tidymodels and learn how they can streamline your machine learning workflows. You’ll then put your tidymodels skills to the test by performing a prediction task.

Classification Models

Learn how to predict categorical outcomes by training classification models. Using the skills you’ve gained so far, you’ll perform a prediction task.

Feature Engineering

Find out how to bake feature engineering pipelines with the recipes package. You’ll prepare numeric and categorical data to help machine learning algorithms optimize your predictions.

Workflows and Hyperparameter Tuning

Now it’s time to streamline the modeling process using workflows and fine-tune models with cross-validation and hyperparameter tuning. You’ll learn how to tune a decision tree classification model to perform a prediction task.

Analyzing Survey Data in R

Learn survey design using common design structures followed by visualizing and analyzing survey results.

Introduction to survey data

Our exploration of survey data will begin with survey weights. In this topic, we will learn what survey weights are and why they are so important in survey data analysis. Another unique feature of survey data are how they were collected via clustering and stratification. We’ll practice specifying and exploring these sampling features for several survey datasets.

Exploring categorical data

Now that we have a handle of survey weights, we will practice incorporating those weights into our analysis of categorical data in this topic. We’ll conduct descriptive inference by calculating summary statistics, building summary tables, and constructing bar graphs. For analytic inference, we will learn to run chi-squared tests.

Exploring quantitative data

Of course not all survey data are categorical and so in this topic, we will explore analyzing quantitative survey data. We will learn to compute survey-weighted statistics, such as the mean and quantiles. For data visualization, we’ll construct bar-graphs, histograms and density plots. We will close out the topic by conducting analytic inference with survey-weighted t-tests.

Modeling quantitative data

To model survey data also requires careful consideration of how the data were collected. We will start our modeling topic by learning how to incorporate survey weights into scatter plots through aesthetics such as size, color, and transparency. We’ll model the survey data with linear regression and will explore how to incorporate categorical predictors and polynomial terms into our models.

Developing R Packages

In this course, you will learn the end-to-end process for creating an R package from scratch. You will start off by creating the basic structure for your package and adding in important details like functions and metadata. Once the basic components of your package are in place, you will learn about how to document your package, and why this is important for creating quality packages that other people – as well as your future self – can use with ease. Once you have created the components of your package, you will learn how to test they work properly, by creating tests, running checks, and building your package. By the end of this course, you can expect to have all the necessary skills to create and share your own R packages.

The R Package Structure

In this topic, you will learn the basics of creating an R package. You will learn about the structure of R packages, set up a package, and write a function and include it in your package. You will also learn about the metadata stored in the DESCRIPTION and NAMESPACE files.

Documenting Packages

In this topic, you will learn how to document your package. You will learn why documentation is important, and how to provide documentation for your package, its functions, and other components. You will also learn about what it means to export a function and how to implement this in your package.

Checking and Building R Packages

In this topic, you will learn about how to run checks to ensure that your R package is correctly structured and can be installed. You will learn how to correct common problems and get your package ready to be built so it can be shared with others.

Adding Unit Tests to R Packages

In the final topic, you will learn how to add tests to your package to ensure your code runs as expected if the package is updated or changes. You will look at how to test functions to ensure they produce expected values, and how to test for other aspects of functionality such as expected errors. Once you’ve written tests for your functions, you’ll finally learn how to run your tests and what to do in the case of a failing test.

Machine Learning with caret in R

Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R’s most powerful machine learning facilities, is used throughout the course.

Regression models: fitting them and evaluating their performance

In the first topic of this course, you’ll fit regression models with train() and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).

Classification models: fitting them and evaluating their performance

In this topic, you’ll fit classification models with train() and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).

Tuning model parameters to improve performance

In this topic, you will use the train() function to tweak model parameters through cross-validation and grid search.

Preprocessing your data

In this topic, you will practice using train() to preprocess data before fitting models, improving your ability to making accurate predictions.

Selecting models: a case study in churn prediction

In the final topic of this course, you’ll learn how to use resamples() to compare multiple models and select (or ensemble) the best one(s).

Advanced Level Data Analysis in R

This course is the third of 3 levels, i.e. essential, intermediate, and advanced topics, each taking
the same time. This level builds upon the essential and intermediate level to solidify the slow
organic tutoring, beginning with the basic structures of the R codes and key statistical concepts
completing the discovery journey that will allow you to quickly and easily access advanced
analytical options to unlock the power behind this great open access tool. This course attempts to
strike a good balance between theory and practice by using the computer as a tool for learning
statistical concepts with the hope that you will gain a better understanding of both theory and
practice.
1. Resampling statistics and bootstrapping
2. Generalized linear models
3. Principal components and factor analysis
4. Advanced methods for missing data
5. Multilevel linear models
6. Electives Choice between
– Structural Equation Modeling and Mediation analysis
– Time Series and Forecasting
– Introduction to machine learning
7. Statistical report/ publication/ thesis writing
8. Practical research project

The capstone project is one of the most lauded elements of our program. As a final step during the Data Analysis in R series, our participants work on real research and/ or industrial project. The capstone project class will allow students to create a usable/public data
product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners. Teamwork in groups forms a good opportunity for developing and accelerating interdisciplinary collaboration.
This culminates in a submission in an internationally reputed academic journal or conference, or seminar in a professional audience (at the industrial, government, or academic
institution).

Deep Learning in Python

In this course, you’ll expand your deep-learning knowledge and take your machine-learning skills to the next level. Working with Keras and PyTorch, you’ll learn about neural networks, deep learning model workflows, and how to optimize your models. You’ll then use TensorFlow to build
linear regression models and neural networks. Throughout the track, you’ll use machine learning techniques to solve real-world challenges, such as predicting housing prices, building a neural network to predict handwritten numbers, and identifying forged banknotes. By the end of the track, you’ll be ready to use Keras to train and test complex, multi-output networks and dive deeper into deep learning.

Spatial Statistics in R

Everything happens somewhere, and increasingly the place where all these things happen is being recorded in a database. There is some truth behind the oft-repeated statement that 80% of data have a spatial component. So what can we do with this spatial data? Spatial statistics, of course! Location is an important explanatory variable in so many things – be it a disease outbreak, an animal’s choice of habitat, a traffic collision, or a vein of gold in the mountains – that we would be wise to include it whenever possible. This course will start you on your journey of spatial data analysis. You’ll learn what classes of statistical problems present themselves with spatial data, and the basic techniques of how to deal with them. You’ll see how to look at a mess of dots on a map and bring out meaningful insights

Introduction

After a quick review of spatial statistics as a whole, you’ll go through some point-pattern analysis. You’ll learn how to recognize and test different types of spatial patterns.

Point Pattern Analysis

Point Pattern Analysis answers questions about why things appear where they do. The things could be trees, disease cases, crimes, lightning strikes – anything with a point location.

Areal Statistics

So much data is collected in administrative divisions that there are specialized techniques for analyzing them. This chapter presents several methods for exploring data in areas.

Geostatistics

Originally developed for the mining industry, geostatistics covers the analysis of location-based measurement data. It enables model-based interpolation of measurements with uncertainty estimation.

Building Recommendation Engines in Python

We’ve come to expect personalized experiences in advertisement — such as an online retailer suggesting items you might also like to purchase. But how are these suggestions generated? In this course, you’ll learn everything you need to know to create your own recommendation engine. Through hands-on exercises, you’ll get to grips with the two most common systems, collaborative filtering and content-based filtering. Next, you’ll learn how to measure similarities like the Jaccard distance and cosine similarity, and how to evaluate the quality of recommendations on test data using the root mean square error (RMSE). By the end of this course, you’ll have built your recommendation engine and be able to apply your Python skills to create these systems for any industry.

Introduction to Recommendation Engines

What problems are recommendation engines designed to solve and what data are best suited for them? Discern what insightful recommendations can be made even with limited data, and learn how to create your own recommendations.

Content-Based Recommendations

Discover how item attributes can be used to make recommendations. Create valuable comparisons between items with both categorical and text data. Generate profiles to recommend new items for users based on their past preferences.

Collaborative Filtering

Discover new items to recommend to users by finding others with similar tastes. Learn to make user-based and item-based recommendations—and in what context they should be used. Use k-nearest neighbors models to leverage the wisdom of the crowd and predict how someone might rate an item they haven’t yet encountered.

Matrix Factorization and Validating Your Predictions

Understand how the sparsity of real-world datasets can impact your recommendations. Leverage the power of matrix factorization to deal with this sparsity. Explore the value of latent features and use them to better understand your data. Finally, put the models you’ve discovered to the test by learning how to validate each of the approaches you’ve learned.

Modeling with tidymodels in R

Analyzing Survey Data in R

Developing R Packages

Machine Learning with caret in R

Advanced Level Data Analysis in R

Deep Learning in Python

Spatial Statistics in R

Building Recommendation Engines in Python

What we do

Quick Links

Contact Us

Our Courses

Contact Us