- TypeTraining or Development Class
- Location Kigali, Rwanda, Rwanda
- Date 30-09-2024 - 04-10-2024
Education/Teaching/Training/Development
Research/Science
Business Development
Python has been one of the premier, flexible, and powerful open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. This training is a step-by-step guide to Python and Statistical Data Analysis with extensive hands on. The course is delivered with several activity problems, assignments and scenarios that help participants gain practical experience in data handling, analysis, interpretation as well as reporting. This course starts by exploring basic statistics such as mean, median and mode and commence to advanced exploratory features such as groups comparisons, regression, test of relationships, classification, clustering, just to mention a few.
Learning outcomes
By the end of this course the participants will be able to:
• Easily read and write files of various types in to a Python program.
• Identify and fix errors in datasets.
• Work with Python 'modules' and use them for data analysis tasks
• Use libraries like pandas, numpy, matplotlib, scikit, and master the concepts like Python machine learning, scripts, and sequence.
• Gain high level skills on statistical results interpretation and report writing.
Who should enroll?
The course is useful for professionals who use data as part of their work and who need to make decisions from data analysis. Those with prior understanding of programming and statistics finds it easier to take this course.
Why train with us
Vital Extra Learning guarantees our clients:
• State-of-the-art facilities and training infrastructure
• Extended tradition of hand-holding during post engagement
• Service delivery through highly seasoned industry experts.
• Value for money
TOPICS TO BE COVERED:
Module1: Introduction
Introduction to Statistical Data Analysis
• Introduction to statistical concepts
• Descriptive and inferential statistics
• Research designing
• The research/survey process
Overview of Data Science
• Introduction to data science
• Different sectors using data science
• Purpose and components of python
Data Analytics Overview
• Data analytics process
• Knowledge check
• Exploratory Data Analysis (EDA)
• EDA-Quantitative technique
• EDA – Graphical technique
• Data analytics conclusion or predictions
• Data analytics communication
• Data types and plotting considerations
Module 2: Statistical Analysis and Business Applications
Introduction to statistical data analysis
• Statistical analysis considerations
• Population and sample
• Statistical analysis process
• Descriptive statistics – Measures of centres, distribution, dispersion
• Inferential Statistics (correlation, regression, t-tests, chi-square, etc)
Python Environment Setup and Essentials
• Anaconda
• Installation of Anaconda Python distribution
• Data types with Python
• Basic operators and functions
Mathematical Computing with Python (NUMPY)
• What is NumPy?
• NumPy vs list
• Installation
• NumPy arrays
• Built-in methods of NumPy (arrange; zeros and ones; linspace; eye; random)
• Array attributes and methods (reshape; max, min, argmax, argmin; shape; dtype)
• NumPy indexing and selection
• Broadcasting
• Indexing a 2D array (matrices)
• Selection
• NumPy operations (arithematic; universal array functions)
• Vectorization
Module 3: Scientific Computing with Python (SCIPY)
• Introduction to SciPy
• SciPy sub package – integration and optimisation
• Calculating eigenvalues and eigenvector
• Using SciPy to solve a linear algebra problem
• Use SciPy to define random variables for random values
Data Manipulation with PANDAS
• Introduction to Pandas
• DataFrame in Pandas
• Viewing and opening data
• Dealing with missing values
• Data operations
• Reading and writing files
• Pandas SQL operation
Machine Learning with SCIKIT–LEARN
• Introduction to machine learning
• Understanding data sets and extraction features
• Problem types and learning models
• How to train, test and optimise models
• Considerations for supervised learning models
• Scikit-Learn
• Supervised learning models – Linear regression, logistic regression
• Unsupervised learning models
• Pipeline
• Model persistence and evaluation
Module 4: Natural Language Processing with SCIKIT LEARN
• Overview of Natural Language Processing
• Applications of Natural Language Processing
• Libraries-Scikit
• Extraction considerations
• Scikit Learn-model training and grid search
Data Visualisation in Python Using MATPLOT-LIB
• Introduction to data visualisation
• Line properties
• (x, y) plot and subplots
• Types of plots
Module 5: Web Scraping with Beautiful Soup
• Web scraping and parsing
• Knowledge check
• Understanding and searching the tree
• Navigating options and modification options of a tree
• Parsing and printing documents
Integration with Hadoop MapReduce and Spark
• Big data solutions in Python
• Big Data and Hadoop
• Hadoop core components
• Python integration with HDFS using Hadoop streaming
• Using Hadoop streaming for calculating word count
• Python Integration with Spark using PySpark
• Using PySpark to determine word count