Call Us: +1 4169908385 (Mon - Saturday)

Mail us for help: contact@famaaconsultancy.com

Scarborough, Ontario, Canada

Data Science

Session-1

HDFS, Map Reduce, Hive

Introduction to Big data and Hadoop Ecosystem
Why industry needs BigData? Advantages of Bigdata over traditional RDBMS
Introduction to Bigdata ecosystems
Understanding Data in various formats, transformation techniques
HDFS, YARN architecture
MapReduce
Understanding Hadoop and Hive
HDFS and Hive
Hive and datatypes
Hive advanced features for performance
Uses of Hive in real-life projects
Project -1

Session-2

Impala, Oozie, Shell Scripting, Linux

Usages of Shell scripting in Bigdata projects
Shell and Hive exercises
Introduction to Impala
Architecture of Impala
Usages of Hive and Impala in the Real life project
Understanding Oozie as a scheduler
- Oozie Coordinator
- Setup ooze job
Introduction to Sqoop
Understanding capabilities of Sqoop and underlying MapReduce
Use Sqoop to ingest data from traditional database to HDFS/ Hive
Project-2

Session-3

Spark, Scala

Introduction to Scala programing language
Scala from a functional perspective
Scala features for Bigdata transformations
Spark, the fastest data processing engine in the world
Spark architecture
Deep drive sparks data transformation capabilities
Spark SQL with HDFS, Hive, and Impala
Dealing with various data types JSON, XML, CSV, parquet, text
Project-3

Session-4

Spark Streaming, Flume

Introduction to streaming, a new era of data analytics
Introduction Kafka
A deep drive of Kafka architecture
Setup up Kafka for message generation
Spark Streaming with Kafka
Kafka performance Tuning considerations
Consideration for Zero data loss streaming pipelines
Dealing with small file issues and compaction
Flume architecture
Usages of Flume to setup streaming pipeline
Exercises on Flume Agent setup
Kafka and Flume
project 4

Session-5

Big Data on Cloud

Understanding BigData technologies in Cloud AWS
Using Kinesis, Firehose, data stream
Using Dynamo DB
Using Lamda, Hive, Glue
Understanding Elastic MapReduce (EMR)
Spark on Cloud
Project 5

Session-6

Bigdata on Cloud, Python, Introduction to DataScience

Flume on Cloud
Hue, Splunk on Cloud
Introduction to Python for data science -Circuit Learn, Pandas, Numpy
Using Jupyter notebook with Python
Understanding data Science and its usages in real life usage cases
Going over various data science Algorithms – Regression and Classification

Session-7

Data Science

Using PySpark, Spak MLlib For data science
Using Spark-Scala and MLlib for data science
Understanding Features, and training models
Data preparation for training model
Machine learning on cloud -SageMaker
Project-6

Session-8

Docker

Understanding Micro Services
Introduction to Docker and its usages
Docker installation, configuration
Understanding and working with container
Inter Containers communication, expose services through port
Understanding docker file
Container-based deployment
Docker compose
Introduction to Kubernetes
Introduction to Helm chart
Using Kubernetes
Deployment of Docker images to Kubernetes using Helm Chart
Managing PODs

380 St Kilda Road, Melbourne, Australia

Call Us: (210) 123-451 (Sat - Thursday)

Monday - Friday (10am - 05 pm)

Contact us