Big Data Engineering
Booster

Ramp up your essential knowledge into market ready skills with this Big Data Engineering Booster course. Keeping the version of the code is one of the most important skill for any Data Engineer, you will become an expert in managing GitHub repositories through this course. Get ready to learn Designing Databases, Data Lakes, Data Marts, HDFS File Types, Compressions, Airflow and more excitingly Spark (PySpark) processing frameworks in this course.

Have you ever thought of getting hands-on in completing 3 Real World Projects? Yes, you heard it right. All the combinations of tech stacks are covered in these projects – Non-Hadoop, Big Data & Hybrid (Big Data & Non-Hadoop mix). You will get the experience of implementing the knowledge in action. You could still use the AWS Cloud Server setup from the Big Data Engineering – Essentials course for your hands on if your system resources are not sufficient enough.

Pre-requisites (Free)

Big Data Engineering – Essentials

SQL Foundation

Python Fundamentals

Shell/Bash Scripting for Beginners

System Requirements

CPU: Quad cores with i5 or better/M1

Memory: 16GB

OS: Windows/MacOS

Not to worry if you do not have enough capacity on your system, towards the end of this course you will be guided on procuring AWS Cloud server for your practice.

Mode Of Trainings

Online Interactive Sessions

Recorded Video Sessions – From the latest Online batch

Resources

Approximate number of sessions: 52 (Varies across the batches)

Lifetime access to the recorded videos will be given along with all supportive documents, logs, references and software’s if any.

Placements

Are you ready with your profile? Share it with us and our placement partners will help you find the right and suitable opportunity. You can also find one on our Opportunities page anytime.

Chapter 1: VERSION CONTROL SYSTEM

GitHub Introduction
Setup
Repo
Branches
Forks
Code Issues
Commits
Pull Requests
Squash & Merges
Conflicts
Code Reviews & Testing
Responsibilities
Real world Hands-on
Command line: Single contributor, smooth life cycle
Command line: Single Contributor, linking issues
Command line: Single Contributor, PR reviews
GitHub Desktop: Repeat exercises of command line
GitHub Web: User access control
GitHub Web: Create an Organization
Command line: Dual Contributors, conflicts
Command line: How to avoid conflicts

Chapter 2: DATA WAREHOUSE

Definition
Types
Advantages
Data Mart
Hadoop Warehouse
Data Lake
Architectures
RDBMS
NoSQL
InMemory
Clear the Clutter

Chapter 3: DATA MODELLING

Schemas
Types
Facts & Dimensions
Data Models
Normalization
Star Schema
Snowflake Schema
OTLP & OLAP
SCD tables
Summarize

Chapter 4: SPARK PROCESSING

Local Setup
PyCharm Integration
Realworld Setup
Recap
Zeppelin
Transformations & Actions
PySpark
First Code
Spark SQL
Spark Dataframe
Applications
Hands-on (Part 1)
Hands-on (Part 2)

Chapter 5: HADOOP DATAFILES

File Formats
Text
Avro
Parquet & RCFile
ORC
SequenceFile
Compressions
Choose the best

Chapter 6: SCHEDULING PIPELINES

Airflow Intro
Installation
Example DAG’s
Pipelines & Dependencies
Importing Modules
Default Arguments
Tasks
Setting up Dependencies
Testing Pipeline
Schedule
Presets
Catchup
Backfill
Passing Parameters when triggering dags
Hands-on

Chapter 7: REAL WORLD PROJECT – LOCAL

Project Requirements
Data Source Schema (ER Diagram)
Data Mart Modeling
Design Jobs
Infra Setup
Initial Data Load
Initial Data Load
Development
Incremental Loads
Testing
Airflow Pipeline
Logging
Deployment & Scheduling

Chapter 8: REAL WORLD PROJECT – BIGDATA

Project Requirements
Data Source Schema
Data Mart Modeling
Design Tasks
Infra Setup
Initial Data Load
Development
Incremental Loads
Testing
Airflow Pipeline
Logging
Deployment & Scheduling

Chapter 9: REAL WORLD PROJECT – HYBRID

API’s
API Source & End points
UNIX/EPOCH timestamps
Project Requirements
Lake
Marts
Database Design
Pipelines
Infra Setup
Development
Code
Logging
Debugging
Deployment & Scheduling
Exercise

Chapter 10: REWIND & RECAP

Summary
Q & A

Chapter 11: PROFILE BUILDING

Chapter 12: MOCK INTERVIEW

Frequently Asked Questions (FAQs)

What are the modes of training available?

There are two modes of training. Online Instructor Led or Recorded Video Sessions. While you can purchase the later anytime, look out for the schedule on this page to take the first.

Is this course enough to become a Data Engineer?

This is the second level course towards becoming a Data Engineer and you are market ready for Junior positions.

What are the pre-requisites for this course?

Basic SQL, Python & Shell script programming skills and Booster are the pre-requisites.

Do I get any assistance if I enroll for Recorded Video Sessions?

You will be part of the professional community and there will be assistance for your blockers.

Will there be any placement assistance?

You will be assisted and guided in profile building and mock interviews