Mon - Fri: 9:00 am - 07.00pm
Famaa ConsultancyFamaa ConsultancyFamaa Consultancy
(Mon - Saturday)
contact@famaaconsultancy.com
Canada

Data Science

Data Science

Session-1

HDFS, Map Reduce, Hive

  • Introduction to Big data and Hadoop Ecosystem
  • Why industry needs BigData? Advantages of Bigdata over traditional RDBMS
  • Introduction to Bigdata ecosystems
  • Understanding Data in various formats, transformation techniques
  • HDFS, YARN architecture
  • MapReduce
  • Understanding Hadoop and Hive
  • HDFS and Hive
  • Hive and datatypes
  • Hive advanced features for performance
  • Uses of Hive in real-life projects
  • Project -1

Session-2

Impala, Oozie, Shell Scripting, Linux

  • Usages of Shell scripting in Bigdata projects
  • Shell and Hive exercises
  • Introduction to Impala
  • Architecture of Impala
  • Usages of Hive and Impala in the Real life project
  • Understanding Oozie as a scheduler
    • Oozie Coordinator
    • Setup ooze job
     
  • Introduction to Sqoop
  • Understanding capabilities of Sqoop and underlying MapReduce
  • Use Sqoop to ingest data from traditional database to HDFS/ Hive
  • Project-2

Session-3

Spark, Scala

  • Introduction to Scala programing language
  • Scala from a functional perspective
  • Scala features for Bigdata transformations
  • Spark, the fastest data processing engine in the world
  • Spark architecture
  • Deep drive sparks data transformation capabilities
  • Spark SQL with HDFS, Hive, and Impala
  • Dealing with various data types JSON, XML, CSV, parquet, text
  • Project-3

Session-4

Spark Streaming, Flume

  • Introduction to streaming, a new era of data analytics
  • Introduction Kafka
  • A deep drive of Kafka architecture
  • Setup up Kafka for message generation
  • Spark Streaming with Kafka
  • Kafka performance Tuning considerations
  • Consideration for Zero data loss streaming pipelines
  • Dealing with small file issues and compaction
  • Flume architecture
  • Usages of Flume to setup streaming pipeline
  • Exercises on Flume Agent setup
  • Kafka and Flume
  • project 4

Session-5

Big Data on Cloud

  • Understanding BigData technologies in Cloud AWS
  • Using Kinesis, Firehose, data stream
  • Using Dynamo DB
  • Using Lamda, Hive, Glue
  • Understanding Elastic MapReduce (EMR)
  • Spark on Cloud
  • Project 5

Session-6

Bigdata on Cloud, Python, Introduction to DataScience

  • Flume on Cloud
  • Hue, Splunk on Cloud
  • Introduction to Python for data science -Circuit Learn, Pandas, Numpy
  • Using Jupyter notebook with Python
  • Understanding data Science and its usages in real life usage cases
  • Going over various data science Algorithms – Regression and Classification
  •  

Session-7

Data Science

  • Using PySpark, Spak MLlib For data science
  • Using Spark-Scala and MLlib for data science
  • Understanding Features, and training models
  • Data preparation for training model
  • Machine learning on cloud -SageMaker
  • Project-6

Session-8

Docker

  • Understanding Micro Services
  • Introduction to Docker and its usages
  • Docker installation, configuration
  • Understanding and working with container
  • Inter Containers communication, expose services through port
  • Understanding docker file
  • Container-based deployment
  • Docker compose
  • Introduction to Kubernetes
  • Introduction to Helm chart
  • Using Kubernetes
  • Deployment of Docker images to Kubernetes using Helm Chart
  • Managing PODs

Project-7: Create a data science environment using Microservices

Final Project

Familiarity with: 

CORE Java, SQL, Linux

At vero eos et accusamus et iusto odio digni goikussimos ducimus qui to bonfo blanditiis praese. Ntium voluum deleniti atque.

Melbourne, Australia
(Sat - Thursday)
(10am - 05 pm)