loading
hello dummmy text
koncpt-img

Big Data Engineering
Essentials

You are all set to begin your learning path with this first stepping stone and foundational Essential course.  In this course you are mentored through finely tailored course content to introduce the most popular tech stack – Big Data Aka Hadoop. It is equally important to learn the roles and responsibilities of a Data Engineer and it is part of this course curriculum. A clear introduction to Data Pipelines, Data Processing, HDFS, Resource Management, Data Access, Pipeline Automations is covered in this course.

You will be developing a real-world Data Engineering hands-on pipeline using one of the public datasets available along with the context of the business. For those who do not have enough resources to run installations on your system, hands-on session to setup the server on AWS Cloud is included as add-on in this course.

Pre-requisites (Free)

SQL Foundation

Shell/Bash Scripting for Beginners

System Requirements

CPU: Quad cores with i5 or better/M1

Memory: 16GB

OS: Windows/MacOS

Not to worry if you do not have enough capacity on your system, towards the end of this course you will be guided on procuring AWS Cloud server for your practice.

Mode Of Trainings

Online Interactive Sessions

Recorded Video Sessions – From the latest Online batch

 

 

 

Resources

Approximate number of sessions: 25 (Varies across the batches)

Lifetime access to the recorded videos will be given along with all supportive documents, logs, references and software’s if any.

Placements

With this course you are not ready yet for the market hunt. Complete the next Booster level course to be able to get our placement support.

Chapter 1: INTRODUCTION

  1. Responsibilities
  2. ETL/ELT
  3. Data Sources
  4. Batch Processing
  5. Stream Processing
  6. Data Lake
  7. Data Warehouse
  8. Data Marts
  9. Data Staging
  10. Data Integration
  11. Administration
  12. Data Optimizations
  13. Required Skills

Chapter 2: DATA PIPELINES

  1. Pipelines
  2. Automation & Scheduling
  3. Handling Exceptions
  4. Logging

Chapter 3: INSTALLING HADOOP

  1. Hadoop vs RDBMS
  2. System requirements
  3. Installation Modes
  4. Pre-requisites
  5. Installation
  6. Real-world Installations
  7. Questions & Answers

Chapter 4: HADOOP INTRODUCTION

  1. Hadoop Ecosystem
  2. Hadoop Distributions
  3. Evolution
  4. Storage
  5. Resources
  6. Processing
  7. Data Access
  8. Applications

Chapter 5: HDFS STORAGE

  1. HDFS Intro & Architecture
  2. Nodes & File System
  3. Data blocks
  4. Racks & Replications
  5. High Availability
  6. Space Reclamation
  7. Story in Short
  8. Hands-on
  9. Questions & Answers

Chapter 6: RESOURCE MANAGEMENT

  1. YARN Intro
  2. Architecture
  3. Resource Manager
  4. Resource Manager HA
  5. Node Manager
  6. Application Master & Containers
  7. Workflow
  8. Zookeeper
  9. Story in short
  10. Hands-on

Chapter 7: DATA PROCESSING

  1. Processing Engines
  2. MapReduce Intro
  3. MapReduce Architecture
  4. Mappers
  5. Reducers
  6. Spark Intro
  7. Spark Architecture
  8. Spark Workflow
  9. Spark Terms
  10. Spark vs MapReduce
  11. Story in short

Chapter 8: ACCESS DATA

  1. Hive Intro
  2. Hive Architecture
  3. Hive Hands-on
  4. Pig Intro
  5. Pig Architecture
  6. Pig Latin
  7. Pig Hands-on

Chapter 9: SCHEDULING JOBS

  1. Oozie Intro
  2. Architecture
  3. Scheduling
  4. Hands-on

Chapter 10: ESSENTIAL REALTIME PROJECT

  1. Business & Dataset
  2. Data Dictionary
  3. Dump Data
  4. Design Pipeline
  5. Pipeline Development
  6. Oozie Workflow
  7. Conclusion

Chapter 11: AWS CLOUD SETUP

  1. AWS Account
  2. EC2 Instance
  3. Setup & Login
  4. Port Forwarding
  5. Docker & Verify

Chapter 12: QUESTIONS & ANSWERS

Frequently Asked Questions (FAQs)

There are two modes of training. Online Instructor Led or Recorded Video Sessions. While you can purchase the later anytime, look out for the schedule on this page to take the first.

This is the foundational course towards becoming a Data Engineer and needs you to complete Booster as well to be market ready.

Basic SQL, Python & Shell script programming skills are the pre-requisites. Do not worry, we have free courses for you to enroll.

You will be part of the professional community and there will be assistance for your blockers.

After this foundational course, you will have to complete the next level to be market ready. You will be assisted and guided in profile building and mock interviews

Need More info ?

Wondering what ORSKL can do for you?

Related Posts

Big Data Engineering Essentials

Knowledge That Can Always Keep Your Inbox Informed