loading
hello dummmy text
koncpt-img

Big Data Engineering
Booster

Ramp up your essential knowledge into market ready skills with this Big Data Engineering Booster course. Keeping the version of the code is one of the most important skill for any Data Engineer, you will become an expert in managing GitHub repositories through this course. Get ready to learn Designing Databases, Data Lakes, Data Marts, HDFS File Types, Compressions, Airflow and more excitingly Spark (PySpark) processing frameworks in this course.

Have you ever thought of getting hands-on in completing 3 Real World Projects? Yes, you heard it right. All the combinations of tech stacks are covered in these projects – Non-Hadoop, Big Data & Hybrid (Big Data & Non-Hadoop mix). You will get the experience of implementing the knowledge in action. You could still use the AWS Cloud Server setup from the Big Data Engineering – Essentials course for your hands on if your system resources are not sufficient enough.

Pre-requisites (Free)

Big Data Engineering – Essentials

SQL Foundation

Python Fundamentals

Shell/Bash Scripting for Beginners

System Requirements

CPU: Quad cores with i5 or better/M1

Memory: 16GB

OS: Windows/MacOS

Not to worry if you do not have enough capacity on your system, towards the end of this course you will be guided on procuring AWS Cloud server for your practice.

Mode Of Trainings

Online Interactive Sessions

Recorded Video Sessions – From the latest Online batch

 

 

 

Resources

Approximate number of sessions: 52 (Varies across the batches)

Lifetime access to the recorded videos will be given along with all supportive documents, logs, references and software’s if any.

Placements

Are you ready with your profile? Share it with us and our placement partners will help you find the right and suitable opportunity. You can also find one on our Opportunities page anytime.

Chapter 1: VERSION CONTROL SYSTEM

  1. GitHub Introduction
  2. Setup
  3. Repo
  4. Branches
  5. Forks
  6. Code Issues
  7. Commits
  8. Pull Requests
  9. Squash & Merges
  10. Conflicts
  11. Code Reviews & Testing
  12. Responsibilities
  13. Real world Hands-on
  14. Command line: Single contributor, smooth life cycle
  15. Command line: Single Contributor, linking issues
  16. Command line: Single Contributor, PR reviews
  17. GitHub Desktop: Repeat exercises of command line
  18. GitHub Web: User access control
  19. GitHub Web: Create an Organization
  20. Command line: Dual Contributors, conflicts
  21. Command line: How to avoid conflicts

Chapter 2: DATA WAREHOUSE

  1. Definition
  2. Types
  3. Advantages
  4. Data Mart
  5. Hadoop Warehouse
  6. Data Lake
  7. Architectures
  8. RDBMS
  9. NoSQL
  10. InMemory
  11. Clear the Clutter

Chapter 3: DATA MODELLING

  1. Schemas
  2. Types
  3. Facts & Dimensions
  4. Data Models
  5. Normalization
  6. Star Schema
  7. Snowflake Schema
  8. OTLP & OLAP
  9. SCD tables
  10. Summarize

Chapter 4: SPARK PROCESSING

  1. Local Setup
  2. PyCharm Integration
  3. Realworld Setup
  4. Recap
  5. Zeppelin
  6. Transformations & Actions
  7. PySpark
  8. First Code
  9. Spark SQL
  10. Spark Dataframe
  11. Applications
  12. Hands-on (Part 1)
  13. Hands-on (Part 2)

Chapter 5: HADOOP DATAFILES

  1. File Formats
  2. Text
  3. Avro
  4. Parquet & RCFile
  5. ORC
  6. SequenceFile
  7. Compressions
  8. Choose the best

Chapter 6: SCHEDULING PIPELINES

  1. Airflow Intro
  2. Installation
  3. Example DAG’s
  4. Pipelines & Dependencies
  5. Importing Modules
  6. Default Arguments
  7. Tasks
  8. Setting up Dependencies
  9. Testing Pipeline
  10. Schedule
  11. Presets
  12. Catchup
  13. Backfill
  14. Passing Parameters when triggering dags
  15. Hands-on

Chapter 7: REAL WORLD PROJECT – LOCAL

  1. Project Requirements
  2. Data Source Schema (ER Diagram)
  3. Data Mart Modeling
  4. Design Jobs
  5. Infra Setup
  6. Initial Data Load
  7. Initial Data Load
  8. Development
  9. Incremental Loads
  10. Testing
  11. Airflow Pipeline
  12. Logging
  13. Deployment & Scheduling

Chapter 8: REAL WORLD PROJECT – BIGDATA

  1. Project Requirements
  2. Data Source Schema
  3. Data Mart Modeling
  4. Design Tasks
  5. Infra Setup
  6. Initial Data Load
  7. Development
  8. Incremental Loads
  9. Testing
  10. Airflow Pipeline
  11. Logging
  12. Deployment & Scheduling

Chapter 9: REAL WORLD PROJECT – HYBRID

  1. API’s
  2. API Source & End points
  3. UNIX/EPOCH timestamps
  4. Project Requirements
  5. Lake
  6. Marts
  7. Database Design
  8. Pipelines
  9. Infra Setup
  10. Development
  11. Code
  12. Logging
  13. Debugging
  14. Deployment & Scheduling
  15. Exercise

Chapter 10: REWIND & RECAP

  1. Summary
  2. Q & A

Chapter 11: PROFILE BUILDING

Chapter 12: MOCK INTERVIEW

Frequently Asked Questions (FAQs)

There are two modes of training. Online Instructor Led or Recorded Video Sessions. While you can purchase the later anytime, look out for the schedule on this page to take the first.

This is the second level course towards becoming a Data Engineer and you are market ready for Junior positions.

Basic SQL, Python & Shell script programming skills and Booster are the pre-requisites.

You will be part of the professional community and there will be assistance for your blockers.

You will be assisted and guided in profile building and mock interviews

Related Posts

Big Data Engineering Booster

Knowledge That Can Always Keep Your Inbox Informed