# Data Science

## About Course

Data science is an interdisciplinary field that combines scientific methods, algorithms, and systems to extract knowledge and insights from structured and unstructured data. It involves the use of various techniques, including statistical analysis, machine learning, data visualization, and data mining, to discover patterns, make predictions, and solve complex problems.

Data scientists employ their expertise in mathematics, statistics, computer science, and domain knowledge to collect, clean, and analyze large datasets. They use programming languages like Python, R, or SQL, along with tools such as Jupyter Notebook and Apache Hadoop, to manipulate and process data efficiently.

Data science has applications in various industries, including finance, healthcare, marketing, and technology. It helps organizations make data-driven decisions, optimize processes, and gain a competitive edge. By uncovering valuable insights from data, data scientists enable businesses to understand customer behavior, identify trends, improve product development, and enhance operational efficiency.

In summary, data science is a powerful discipline that leverages advanced techniques and computational tools to extract meaningful information from data, driving innovation and informed decision-making across industries.

Recap of Demo

Introduction to Types of Analytics

Project life cycle

An introduction to our E learning platform

Description: Learn about the other moments of business decision as part of Statistical Analysis. Learn

more about Visual data representation and graphical techniques. Learn about Python, R programming

with respect to Data Science and Machine Learning. Understand how to work with different Python

IDE and Python programming examples.

Topics

Data Types

Measure Of central tendency

Measures of Dispersion

Graphical Techniques

Skewness & Kurtosis

Box Plot

R

R Studio

Descriptive Stats in R

Python (Installation and basic commands) and Libraries

Jupyter note book

Set up Github

Descriptive Stats in Python

Pandas and Matplotlib / Seaborn

Topics

Random Variable

Probability

Probility Distribution

Normal Distribution

SND

Expected Value

Sampling Funnel

Sampling Variation

CLT

Confidence interval

Assignments Session-1 (1 hr)

Introduction to Hypothesis Testing

Hypothesis Testing with examples

2 proportion test

2 sample t test

Anova and Chisquare case studies

Visualization

Data Cleaning

Imputation Techniques

Scatter Plot

Correlation analysis

Transformations

Normalization and Standardization

Description: Learn about Linear Regression, components of Linear Regression viz regression line, Linear

Regression calculator, Linear Regression equation. Get introduced to Linear Regression analysis,

Multiple Linear Regression and Linear Regression examples.

Topics

Principles of Regression

Introduction to Simple Linear Regression

Multiple Linear Regression

Description: Learn about the Multiple Logistic Regression and understand the Regression Analysis,

Probability measures and its interpretation. Know what is a confusion matrix and its elements. Get

introduced to “Cut off value” estimation using ROC curve. Work with gain chart and lift chart.

Topics

Multiple Logistic Regression

Confusion matrix

False Positive, False Negative

True Positive, True Negative

Sensitivity, Recall, Specificity, F1 score

Receiver operating characteristics curve (ROC curve)

Description: Learn deployment using streamlit in python

Topics

Streamlit

Description:As part of Data Mining Unsupervised get introduced to various clustering algorithms, learn

about Hierarchial clustering, K means clustering using clustering examples and know what clustering

machine learning is all about.

Topics

Supervised vs Unsupervised learning

Data Mining Process

Hierarchical Clustering / Agglomerative Clustering

Measure of distance

Numeric - Euclidean, Manhattan, Mahalanobis

Categorical - Binary Euclidean, Simple Matching Coefficient, Jaquard’s Coefficient

Mixed - Gower’s General Dissimilarity Coefficient

Types of Linkages

Single Linkage / Nearest Neighbour

Complete Linkage / Farthest Neighbour

Average Linkage

Centroid Linkage

Visualization of clustering algorithm using Dendrogram

K-Means

Description:In this continuation lecture learn about K means Clustering, Clustering ratio and various

clustering metrics. Get introduced to methods of making optimum clusters.

Topics

Non-Hierarchial

Measurement metrics of clustering - Within Sum of Squares, Between Sum of Squares, Total Sum of

Squares

Choosing the ideal K value using Scree plot / Elbow Curve

DBSCAN

Description:Introduction to Density based clustering method

Topics

A geneal intuition for DBSCAN

Different parameters in DBSCAN

Metrics used to evaluate the performance of model

Pro's and Con's of DBSCAN

Description:Learn to apply data reduction in data mining using dimensionality reduction techniques.

Gain knowledge about the advantages of dimensionality reduction using PCA and tSNE

Topics

PCA and tSNE

Why dimension reduction

Advantages of PCA

Calculation of PCA weights

2D Visualization using Principal components

Basics of Matrix algebra

Description:Learn one of the most important topic Association rules in data mining. Understand how

the Apriori algorithm works, and the association rule mining algorithm.

Topics

What is Market Basket / Affinity Analysis

Measure of association

Support

Confidence

Lift Ratio

Apriori Algorithm

Description:Learn how online recommendations are made. Get insights about online Recommender

System, Content-Based Recommender Systems, Content-Based Filtering and various

recommendation engine algorithms. Get to know about people to people collaborative filtering and

Item to item collaborative filtering.

Topics

User-based collaborative filtering

Measure of distance / similarity between users

Driver for recommendation

Computation reduction techniques

Search based methods / Item to item collaborative filtering

Vulnerability of recommender systems

Workflow from data to deployment

Data nuances

Mindsets of modelling

Description:Decision Tree and is one of the most powerful classifier algorithms today. Under this

tutorial learn the math behind decision tree algorithm with a case study

Topics

Elements of Classification Tree - Root node, Child Node, Leaf Node, etc.

Greedy algorithm

Measure of Entropy

Attribute selection using Information Gain

Implementation of Decision tree using C5.0 and Sklearn libraries

Description: Learn about how to handle categorical data using different methods

Topics

Encoding Methods

OHE

Label Encoders

Outlier detection-Isolation Fores

Predictive power Score

Description: It helps in reducing overfitting , training time and it improves accuracy

Topics

Recurcive Feature Elimination

PCA

Description:Here you are going to learn what are they ways to improve the models interms of

accuracy and reducing overfitting ( Bias vs Variance )

Topics

Splitting data into train and test

Methods of cross validation

Accuracy methods

Description:Rather working on a single model we can work on a diverse set of models it can

achieved by using Ensemble learning

Topics

Bagging

Boosting

Random Forest

XGBM

LGBM

Description:KNN and SVM: KNN algorithm is by far one of the easiest algorithms to learn and

interpret. SVM is another most popular algorithm best part is it can be used for both classification

and regression purpose, learn these two by using simple case studies

Topics

Deciding the K value

Building a KNN model by splitting the data

Understanding the various generalization and regulation techniques to avoid overfitting and

underfitting

Kernel tricks

Lasso Regression

Ridge Regression

Description: Neural Networks: It is a supervised machine learning algorithm which mimics our

human brain and it is foundation for Artificial Intelligence and Deep Learning. Here you learn the

operation of neural networks using R and Python.

Topics

Artificial Neural Network

Biological Neuron vs Artificial Neuron

ANN structure

Activation function

Network Topology

Classification Hyperplanes

Best fit “boundary”

Gradient Descent

Stochastic Gradient Descent Intro

Back Propogation

Intoduction to concepts of CNN

Description: Text mining or Text data mining is one of the wide spectrum of tools for analyzing

unstructured data. As a part of this course, learn about Text analytics, the various text mining

techniques, its application, text mining algorithms and sentiment analysis.

Topics

Sources of data

Bag of words

Pre-processing, corpus Document-Term Matrix (DTM) and TDM

Word Clouds

Corpus level word clouds

Sentiment Analysis

Positive Word clouds

Negative word clouds

Unigram, Bigram, Trigram

Vector space Modelling

Word embedding

Document Similarity using Cosine similarity

Description: Learn how to extract data from Social Media, download user reviews from E-commerce

and Travel websites. Generate various visualizations using the downloaded data.

Topics

Extract Tweets from Twitter

Extract user reviews of the products from Amazon, Snapdeal and TripAdvisor

Description: Learn how to perform text analytics using Python and work with various libraries that

aid in data extraction, text mining, sentiment analysis and

Topics

Install Libraries from Shell

Extraction and text analytics in Python

Description: Natural language processing applications are in great demand now and various natural

language processing projects are being taken up. As part of this tutorial, learn about Natural

language and ‘Natural language understanding’.

Topics

Sentiment Extraction

Lexicons and Emotion Mining

Description: Under the Naive Bayes classifier tutorial, learn how the classification modeling is done

using Bayesian classification, understand the same using Naive Bayes example. Learn about Naive

Bayes through the example of text mining.

Topics

Probability – Recap

Bayes Rule

Naive Bayes Classifier

Text Classification using Naive Bayes

Description: Forecasting or Time Series Analysis is an important component in analytics. Here, get

to know the various forecasting methods, forecasting techniques and business forecasting

techniques. Get introduced to the time series components and the various time series analysis

using time series examples.

Topics

Introduction to time series data

Steps of forecasting

Components of time series data

Scatter plot and Time Plot

Lag Plot

ACF - Auto-Correlation Function / Correlogram

Visualization principles

Naive forecast methods

Errors in forecast and its metrics

Model Based approaches

Linear Model

Exponential Model

Quadratic Model

Additive Seasonality

Multiplicative Seasonality

Model-Based approaches

AR (Auto-Regressive) model for errors

Random walk

ARMA (Auto-Regressive Moving Average), Order p and q

ARIMA (Auto-Regressive Integrated Moving Average), Order p, d and q

Data-driven approach to forecasting

Smoothing techniques

Moving Average

Simple Exponential Smoothing

Holts / Double Exponential Smoothing

Winters / HoltWinters

De-seasoning and de-trending

Forecasting using Python and R

Concept with a business case

Project Discussion using Python

Resume Preparation

Interview Support

Introduction to What is DataBase

Difference between SQL and NOSQL DB

How to Install MYSQL and Workbench

Connecting to DB

Creating to DB

What are the Languages inside SQL How to Create Tables inside DB and Inserting the Records

Select statement and using Queries for seeing your data

Joining 2 tables

Where clause usage

Indexes and views

Different operations in SQL

How to Connect to your applications from MYSQL includes R and Python

What is Data Visualization?

Why Visualization came into Picture?

Importance of Visualizing Data

Poor Visualizations Vs. Perfect Visualizations

Principles of Visualizations

Tufte’s Graphical Integrity Rule

Tufte’s Principles for Analytical Design

Visual Rhetoric

Goal of Data Visualization

Tableau – Data Visualization Tool

Introduction to Tableau

What is Tableau? Different Products and their functioning

Architecture Of Tableau

Pivot Tables

Split Tables

Hiding

Rename and Aliases

Data Interpretation

Tableau User Interface

Understanding about Data Types and Visual Cues

Basic Chart types

Text Tables, Highlight Tables, Heat Map

Pie Chart, Tree Chart

Bar Charts, Circle Charts

Intermediate Chart

Time Series Charts

Time Series Hands-On

Dual Lines

Dual Combination

Advanced Charts

Bullet Chart

Scatter Plot

Introduction to Correlation Analysis

Introduction to Regression Analysis

Trendlines

Histograms

Bin Sizes in Tableau

Box Plot

Pareto Chart

Donut Chart, Word Cloud

Forecasting ( Predictive Analysis)

Maps in Tableau

Types of Maps in Tableau

Polygon Maps

Connecting with WMS Server

Custom Geo coding

Data Layers

Radial & Lasso Selection

Adding Background Image

How to get Background Image and highlight the data on it

Creating Data Extracts

Filters and their working at different levels

Usage of Filters on at Extract and Data Source level

Worksheet level filters

Context, Dimension Measures Filter

Data Connectivity in-depth understanding

Joins

Unions

Data Blending

Cross Database Joins

Sets

Groups

Parameters

Creating Calculated Fields

Logical Functions

Case-If Function

ZN Function

Else-If Function

Ad-Hoc Calculations

Quick Table Calculations

Level of Detail (LoD)

Fixed LoD

Include LoD

Exclude LoD

Responsive Tool Tips

Dashboards

Actions at Sheet level and Dashboard level

Story

Introduction to Neural Network & Deep Learning

Topics

Introduction

Deep Learning Importance [Strength & Limitation]

SP | MLP

Neural Network Overview

Neural Network Representation

Activation Function

Loss Function

Importance of Non-linear Activation Function

Gradient Descent for Neural Network

Parameter & Hyper parameter

Topics

Train, Test & Validation Set

Vanishing & Exploding Gradient

Dropout

Regularization

Optimization algorithm

Learning Rate

Tuning

Softmax

CNN

Topics

CNN

Deep Convolution Model

Detection Algorithm

Face Recognition

RNN

Topics

RNN

LSTM

Bi Directional LSTM