Data Science

About Course
Data science is an interdisciplinary field that combines scientific methods, algorithms, and systems to extract knowledge and insights from structured and unstructured data. It involves the use of various techniques, including statistical analysis, machine learning, data visualization, and data mining, to discover patterns, make predictions, and solve complex problems.
Data scientists employ their expertise in mathematics, statistics, computer science, and domain knowledge to collect, clean, and analyze large datasets. They use programming languages like Python, R, or SQL, along with tools such as Jupyter Notebook and Apache Hadoop, to manipulate and process data efficiently.
Data science has applications in various industries, including finance, healthcare, marketing, and technology. It helps organizations make data-driven decisions, optimize processes, and gain a competitive edge. By uncovering valuable insights from data, data scientists enable businesses to understand customer behavior, identify trends, improve product development, and enhance operational efficiency.
In summary, data science is a powerful discipline that leverages advanced techniques and computational tools to extract meaningful information from data, driving innovation and informed decision-making across industries.
Recap of Demo
Introduction to Types of Analytics
Project life cycle
An introduction to our E learning platform
Description: Learn about the other moments of business decision as part of Statistical Analysis. Learn
more about Visual data representation and graphical techniques. Learn about Python, R programming
with respect to Data Science and Machine Learning. Understand how to work with different Python
IDE and Python programming examples.
Topics
Data Types
Measure Of central tendency
Measures of Dispersion
Graphical Techniques
Skewness & Kurtosis
Box Plot
R
R Studio
Descriptive Stats in R
Python (Installation and basic commands) and Libraries
Jupyter note book
Set up Github
Descriptive Stats in Python
Pandas and Matplotlib / Seaborn
Topics
Random Variable
Probability
Probility Distribution
Normal Distribution
SND
Expected Value
Sampling Funnel
Sampling Variation
CLT
Confidence interval
Assignments Session-1 (1 hr)
Introduction to Hypothesis Testing
Hypothesis Testing with examples
2 proportion test
2 sample t test
Anova and Chisquare case studies
Visualization
Data Cleaning
Imputation Techniques
Scatter Plot
Correlation analysis
Transformations
Normalization and Standardization
Description: Learn about Linear Regression, components of Linear Regression viz regression line, Linear
Regression calculator, Linear Regression equation. Get introduced to Linear Regression analysis,
Multiple Linear Regression and Linear Regression examples.
Topics
Principles of Regression
Introduction to Simple Linear Regression
Multiple Linear Regression
Description: Learn about the Multiple Logistic Regression and understand the Regression Analysis,
Probability measures and its interpretation. Know what is a confusion matrix and its elements. Get
introduced to “Cut off value” estimation using ROC curve. Work with gain chart and lift chart.
Topics
Multiple Logistic Regression
Confusion matrix
False Positive, False Negative
True Positive, True Negative
Sensitivity, Recall, Specificity, F1 score
Receiver operating characteristics curve (ROC curve)
Description: Learn deployment using streamlit in python
Topics
Streamlit
Description:As part of Data Mining Unsupervised get introduced to various clustering algorithms, learn
about Hierarchial clustering, K means clustering using clustering examples and know what clustering
machine learning is all about.
Topics
Supervised vs Unsupervised learning
Data Mining Process
Hierarchical Clustering / Agglomerative Clustering
Measure of distance
Numeric - Euclidean, Manhattan, Mahalanobis
Categorical - Binary Euclidean, Simple Matching Coefficient, Jaquard’s Coefficient
Mixed - Gower’s General Dissimilarity Coefficient
Types of Linkages
Single Linkage / Nearest Neighbour
Complete Linkage / Farthest Neighbour
Average Linkage
Centroid Linkage
Visualization of clustering algorithm using Dendrogram
K-Means
Description:In this continuation lecture learn about K means Clustering, Clustering ratio and various
clustering metrics. Get introduced to methods of making optimum clusters.
Topics
Non-Hierarchial
Measurement metrics of clustering - Within Sum of Squares, Between Sum of Squares, Total Sum of
Squares
Choosing the ideal K value using Scree plot / Elbow Curve
DBSCAN
Description:Introduction to Density based clustering method
Topics
A geneal intuition for DBSCAN
Different parameters in DBSCAN
Metrics used to evaluate the performance of model
Pro's and Con's of DBSCAN
Description:Learn to apply data reduction in data mining using dimensionality reduction techniques.
Gain knowledge about the advantages of dimensionality reduction using PCA and tSNE
Topics
PCA and tSNE
Why dimension reduction
Advantages of PCA
Calculation of PCA weights
2D Visualization using Principal components
Basics of Matrix algebra
Description:Learn one of the most important topic Association rules in data mining. Understand how
the Apriori algorithm works, and the association rule mining algorithm.
Topics
What is Market Basket / Affinity Analysis
Measure of association
Support
Confidence
Lift Ratio
Apriori Algorithm
Description:Learn how online recommendations are made. Get insights about online Recommender
System, Content-Based Recommender Systems, Content-Based Filtering and various
recommendation engine algorithms. Get to know about people to people collaborative filtering and
Item to item collaborative filtering.
Topics
User-based collaborative filtering
Measure of distance / similarity between users
Driver for recommendation
Computation reduction techniques
Search based methods / Item to item collaborative filtering
Vulnerability of recommender systems
Workflow from data to deployment
Data nuances
Mindsets of modelling
Description:Decision Tree and is one of the most powerful classifier algorithms today. Under this
tutorial learn the math behind decision tree algorithm with a case study
Topics
Elements of Classification Tree - Root node, Child Node, Leaf Node, etc.
Greedy algorithm
Measure of Entropy
Attribute selection using Information Gain
Implementation of Decision tree using C5.0 and Sklearn libraries
Description: Learn about how to handle categorical data using different methods
Topics
Encoding Methods
OHE
Label Encoders
Outlier detection-Isolation Fores
Predictive power Score
Description: It helps in reducing overfitting , training time and it improves accuracy
Topics
Recurcive Feature Elimination
PCA
Description:Here you are going to learn what are they ways to improve the models interms of
accuracy and reducing overfitting ( Bias vs Variance )
Topics
Splitting data into train and test
Methods of cross validation
Accuracy methods
Description:Rather working on a single model we can work on a diverse set of models it can
achieved by using Ensemble learning
Topics
Bagging
Boosting
Random Forest
XGBM
LGBM
Description:KNN and SVM: KNN algorithm is by far one of the easiest algorithms to learn and
interpret. SVM is another most popular algorithm best part is it can be used for both classification
and regression purpose, learn these two by using simple case studies
Topics
Deciding the K value
Building a KNN model by splitting the data
Understanding the various generalization and regulation techniques to avoid overfitting and
underfitting
Kernel tricks
Lasso Regression
Ridge Regression
Description: Neural Networks: It is a supervised machine learning algorithm which mimics our
human brain and it is foundation for Artificial Intelligence and Deep Learning. Here you learn the
operation of neural networks using R and Python.
Topics
Artificial Neural Network
Biological Neuron vs Artificial Neuron
ANN structure
Activation function
Network Topology
Classification Hyperplanes
Best fit “boundary”
Gradient Descent
Stochastic Gradient Descent Intro
Back Propogation
Intoduction to concepts of CNN
Description: Text mining or Text data mining is one of the wide spectrum of tools for analyzing
unstructured data. As a part of this course, learn about Text analytics, the various text mining
techniques, its application, text mining algorithms and sentiment analysis.
Topics
Sources of data
Bag of words
Pre-processing, corpus Document-Term Matrix (DTM) and TDM
Word Clouds
Corpus level word clouds
Sentiment Analysis
Positive Word clouds
Negative word clouds
Unigram, Bigram, Trigram
Vector space Modelling
Word embedding
Document Similarity using Cosine similarity
Description: Learn how to extract data from Social Media, download user reviews from E-commerce
and Travel websites. Generate various visualizations using the downloaded data.
Topics
Extract Tweets from Twitter
Extract user reviews of the products from Amazon, Snapdeal and TripAdvisor
Description: Learn how to perform text analytics using Python and work with various libraries that
aid in data extraction, text mining, sentiment analysis and
Topics
Install Libraries from Shell
Extraction and text analytics in Python
Description: Natural language processing applications are in great demand now and various natural
language processing projects are being taken up. As part of this tutorial, learn about Natural
language and ‘Natural language understanding’.
Topics
Sentiment Extraction
Lexicons and Emotion Mining
Description: Under the Naive Bayes classifier tutorial, learn how the classification modeling is done
using Bayesian classification, understand the same using Naive Bayes example. Learn about Naive
Bayes through the example of text mining.
Topics
Probability – Recap
Bayes Rule
Naive Bayes Classifier
Text Classification using Naive Bayes
Description: Forecasting or Time Series Analysis is an important component in analytics. Here, get
to know the various forecasting methods, forecasting techniques and business forecasting
techniques. Get introduced to the time series components and the various time series analysis
using time series examples.
Topics
Introduction to time series data
Steps of forecasting
Components of time series data
Scatter plot and Time Plot
Lag Plot
ACF - Auto-Correlation Function / Correlogram
Visualization principles
Naive forecast methods
Errors in forecast and its metrics
Model Based approaches
Linear Model
Exponential Model
Quadratic Model
Additive Seasonality
Multiplicative Seasonality
Model-Based approaches
AR (Auto-Regressive) model for errors
Random walk
ARMA (Auto-Regressive Moving Average), Order p and q
ARIMA (Auto-Regressive Integrated Moving Average), Order p, d and q
Data-driven approach to forecasting
Smoothing techniques
Moving Average
Simple Exponential Smoothing
Holts / Double Exponential Smoothing
Winters / HoltWinters
De-seasoning and de-trending
Forecasting using Python and R
Concept with a business case
Project Discussion using Python
Resume Preparation
Interview Support
Introduction to What is DataBase
Difference between SQL and NOSQL DB
How to Install MYSQL and Workbench
Connecting to DB
Creating to DB
What are the Languages inside SQL How to Create Tables inside DB and Inserting the Records
Select statement and using Queries for seeing your data
Joining 2 tables
Where clause usage
Indexes and views
Different operations in SQL
How to Connect to your applications from MYSQL includes R and Python
What is Data Visualization?
Why Visualization came into Picture?
Importance of Visualizing Data
Poor Visualizations Vs. Perfect Visualizations
Principles of Visualizations
Tufte’s Graphical Integrity Rule
Tufte’s Principles for Analytical Design
Visual Rhetoric
Goal of Data Visualization
Tableau – Data Visualization Tool
Introduction to Tableau
What is Tableau? Different Products and their functioning
Architecture Of Tableau
Pivot Tables
Split Tables
Hiding
Rename and Aliases
Data Interpretation
Tableau User Interface
Understanding about Data Types and Visual Cues
Basic Chart types
Text Tables, Highlight Tables, Heat Map
Pie Chart, Tree Chart
Bar Charts, Circle Charts
Intermediate Chart
Time Series Charts
Time Series Hands-On
Dual Lines
Dual Combination
Advanced Charts
Bullet Chart
Scatter Plot
Introduction to Correlation Analysis
Introduction to Regression Analysis
Trendlines
Histograms
Bin Sizes in Tableau
Box Plot
Pareto Chart
Donut Chart, Word Cloud
Forecasting ( Predictive Analysis)
Maps in Tableau
Types of Maps in Tableau
Polygon Maps
Connecting with WMS Server
Custom Geo coding
Data Layers
Radial & Lasso Selection
Adding Background Image
How to get Background Image and highlight the data on it
Creating Data Extracts
Filters and their working at different levels
Usage of Filters on at Extract and Data Source level
Worksheet level filters
Context, Dimension Measures Filter
Data Connectivity in-depth understanding
Joins
Unions
Data Blending
Cross Database Joins
Sets
Groups
Parameters
Creating Calculated Fields
Logical Functions
Case-If Function
ZN Function
Else-If Function
Ad-Hoc Calculations
Quick Table Calculations
Level of Detail (LoD)
Fixed LoD
Include LoD
Exclude LoD
Responsive Tool Tips
Dashboards
Actions at Sheet level and Dashboard level
Story
Introduction to Neural Network & Deep Learning
Topics
Introduction
Deep Learning Importance [Strength & Limitation]
SP | MLP
Neural Network Overview
Neural Network Representation
Activation Function
Loss Function
Importance of Non-linear Activation Function
Gradient Descent for Neural Network
Parameter & Hyper parameter
Topics
Train, Test & Validation Set
Vanishing & Exploding Gradient
Dropout
Regularization
Optimization algorithm
Learning Rate
Tuning
Softmax
CNN
Topics
CNN
Deep Convolution Model
Detection Algorithm
Face Recognition
RNN
Topics
RNN
LSTM
Bi Directional LSTM