Data Science Online Training

Data Science Course Content

Type 1 : Data Science with R and Statistics

Type 2 :  Data Science with Python and Statistics

Introduction to Data Science

  • Introduction to Big Data
  • Roles played by a Data Scientist
  • Analyzing Big Data using Hadoop and R
  • Methodologies used for analysis
  • The Architecture and Methodologies used to solve the Big Data problems

Basic Data Manipulation using R

  • Understanding vectors in R
  • Reading Data, Combining Data
  • Sub Setting data
  • Sorting data and some basic data generation functions

Machine Learning Techniques Using R Part-1

  • Machine Learning Overview,
  • ML Common Use Cases
  • Understanding Supervised and Unsupervised Learning
  • Techniques, Clustering
  • Similarity Metrics
  • Distance Measure Types: Euclidean, Cosine Measures, Creating predictive models

Machine Learning Techniques Using R Part-2

  • Understanding K-Means Clustering
  • Understanding TF-IDF and Cosine Similarity and their application to Vector Space Model
  • Implementing Association rule mining in R

Machine Learning Techniques Using R Part-3

  • Understanding Process flow of Supervised Learning Techniques
  • Decision Tree Classifier
  • How to build Decision trees
  • Random Forest Classifier
  • What is Random Forests
  • Features of Random Forest
  • Out of Box Error Estimate and Variable Importance
  • Naive Bayes Classifier

Introduction to Hadoop Architecture

  • Hadoop Architecture
  • Common Hadoop commands
  • MapReduce and Data loading techniques (Directly in R and in Hadoop using SQOOP, FLUME, and other Data Loading Techniques)
  • Removing anomalies from the data

Integrating R with Hadoop

  • Integrating R with Hadoop using RHadoop and RMR package
  • Exploring RHIPE (R Hadoop Integrated Programming Environment)
  • Writing MapReduce Jobs in R and executing them on Hadoop

Mahout Introduction and Algorithm Implementation

  • Implementing Machine Learning Algorithms on larger Data Sets with Apache Mahout

Additional Mahout Algorithms and Parallel Processing using R

  • Implementation of different Mahout algorithms
  • Random Forest Classifier with parallel processing Library in R

Introduction to Statistics:

  • Types of Statistics
  • Types of Data

Descriptive Statistics

  • Measures of Central Tendency
  • Measures of Central Tendency – Usage Chart
  • Measures of Dispersion / Variability
  • Measures of Shape
  • Application of Variance/Std Deviation

Hypothesis Testing

  • Applications of Hypothesis Testing (Called T Test or Z Test)
  • Steps in Hypothesis Testing

Anova (Analysis of Variance)

  • What is Anova
  • Anova Steps
  • Simple One-Way Anova
  • Simple Two-Way Anova with Multiple Variables

Chi Square Tests

  • What is Chi-Square
  • Applications of Chi-Square


  • Types of Correlation
  • Properties of Correlation
  • Methods of Calculating Correlation
  • Steps to Calculate Correlation

Regression Analysis

  • What is Regression
  • Types of Regression Analysis
  • Properties of The Regression Line
  • Validating the Model
  • Regression Assumptions

Data Transformation for Regression

Dummy Variable Analysis

Variable Selection Procedure for Regression

  • Forward Selection Procedure
  • Backward Elimination Procedure
  • Stepwise Regression Method

Logistic Regression

  • Likelihood Profiling
  • Assumption
  • Variable Selection Method :- Woe And Iv
  • Model Validation
  • Model Performance
  • Prediction

Cluster Analysis

  • What is cluster
  • Application of clustering
  • Types of clustering
  • K Means
  • Dendrogram
  • Validation of Cluster

Decision Tree

  • What is decision Tree
  • How decision tree works
  • Cart
  • Pruning
  • Overfitting
  • Underfitting
  • Model validation
  • Model performance

Market Basket Analysis

  • What is MBA
  • Application of MBA
  • Support
  • Confidence
  • Lift
  • Rules

Random Forest

  • What is random forest
  • Application of random forest
  • Tune parameters
  • How to tune parameters
  • Model validation
  • Model performance

Support Vector Machine

  • What is support vector machine
  • Why to use SVM
  • Hyperplane
  • Kernel
  • Cost
  • Gamma
  • Model validation
  • Model performance

Naïve bayes

  • What is Naïve bayes
  • Bayes theorem
  • Conditional probability
  • Prior probability
  • Posterior probability
  • Application of Naïve bayes
  • Model validation
  • Model performance


  • What is time series
  • What is Arima
  • Stationary
  • Seasonality
  • Trend
  • How to find p,d,q
  • What are p,d,q
  • Find best model
  • Forecasting