Data Science Session-1 HDFS, Map Reduce, Hive Introduction to Big data and Hadoop EcosystemWhy industry needs BigData? Advantages of Bigdata over traditional RDBMSIntroduction to Bigdata ecosystemsUnderstanding Data in various formats, transformation techniquesHDFS, YARN architectureMapReduceUnderstanding Hadoop and HiveHDFS and HiveHive and datatypesHive advanced features for performanceUses of Hive in real-life projectsProject -1 Session-2 Impala, Oozie, Shell Scripting, Linux Usages of Shell scripting in Bigdata projectsShell and Hive exercisesIntroduction to ImpalaArchitecture of ImpalaUsages of Hive and Impala in the Real life projectUnderstanding Oozie as a schedulerOozie CoordinatorSetup ooze job Introduction to SqoopUnderstanding capabilities of Sqoop and underlying MapReduceUse Sqoop to ingest data from traditional database to HDFS/ HiveProject-2 Session-3 Spark, Scala Introduction to Scala programing languageScala from a functional perspectiveScala features for Bigdata transformationsSpark, the fastest data processing engine in the worldSpark architectureDeep drive sparks data transformation capabilitiesSpark SQL with HDFS, Hive, and ImpalaDealing with various data types JSON, XML, CSV, parquet, textProject-3 Session-4 Spark Streaming, Flume Introduction to streaming, a new era of data analyticsIntroduction KafkaA deep drive of Kafka architectureSetup up Kafka for message generationSpark Streaming with KafkaKafka performance Tuning considerationsConsideration for Zero data loss streaming pipelinesDealing with small file issues and compactionFlume architectureUsages of Flume to setup streaming pipelineExercises on Flume Agent setupKafka and Flumeproject 4 Session-5 Big Data on Cloud Understanding BigData technologies in Cloud AWSUsing Kinesis, Firehose, data streamUsing Dynamo DBUsing Lamda, Hive, GlueUnderstanding Elastic MapReduce (EMR)Spark on CloudProject 5 Session-6 Bigdata on Cloud, Python, Introduction to DataScience Flume on CloudHue, Splunk on CloudIntroduction to Python for data science -Circuit Learn, Pandas, NumpyUsing Jupyter notebook with PythonUnderstanding data Science and its usages in real life usage casesGoing over various data science Algorithms – Regression and Classification Session-7 Data Science Using PySpark, Spak MLlib For data scienceUsing Spark-Scala and MLlib for data scienceUnderstanding Features, and training modelsData preparation for training modelMachine learning on cloud -SageMakerProject-6 Session-8 Docker Understanding Micro ServicesIntroduction to Docker and its usagesDocker installation, configurationUnderstanding and working with containerInter Containers communication, expose services through portUnderstanding docker fileContainer-based deploymentDocker composeIntroduction to KubernetesIntroduction to Helm chartUsing KubernetesDeployment of Docker images to Kubernetes using Helm ChartManaging PODs Project-7: Create a data science environment using MicroservicesFinal ProjectFamiliarity with: CORE Java, SQL, Linux