Infobit Technologis
Ashram Road
+ 91 79 27545170
Satellite
+ 91 79 26765170
Ghatlodiya
+ 91 90671 77676
Need Help ?
ash@infobittechnologies.com
  • Home
  • About Us
  • Training Partners
  • Business Partners
  • Contact Us
Courses

Microsoft

  • MCITP
  • MCSA 2008
  • MCSA 2012
  • MCTS

Cisco

  • CCNA
  • CCNA Security
  • CCNP

Oracle

  • OCA
  • OCP
  • OCPJP
  • Solaris Administrator

IBM

  • Lotus Admin
  • AIX Admin

Juniper

  • JNCIA

CompTia

  • A+
  • N+

Other Courses

  • RHCE (RedHat Linux Administrator)
  • Ethical Hacking
  • VCA/VCP (Vmware)
  • BlackBerry Certificate

Career Courses

  • DCNT (Diploma In Computer Hardware & Networking Technology)
  • University Program
  • Authorized Android Training
BigData-Hadoop Developer Certification

05

SYLLABUS for  BigData-Hadoop Developer CLOUDERA International Certification

Duration : 90 Days

Chapter – 1

1.      Course Introduction
• About This Course
• About Apache Hadoop

2.      The Motivation for Hadoop
• Problems with traditional large-scale systems
• Requirements for a New Approach
• Hadoop!
• Hadoop-able problems

3.      Hadoop Basic Concepts
• What is Hadoop?
• The Hadoop Distributed File System (HDFS)
• How MapReduce Works

4.      Hadoop Solutions
• Some Common Hadoop Applications
• Other Interesting Hadoop Use Cases

Chapter - 2

1.      Review

2.      The Hadoop Ecosystem
• Introduction
• Data Storage: HBase
• Data Integration: Flume and Sqoop
• Data Processing: Spark
• Data Analysis: Hive, Pig and Impala
• Workflow Engine: Oozie
• Machine Learning: Mahout

3.      Managing Your Hadoop Solution
• Hadoop in the Data Center
• Cluster Hardware

4.       Introduction to MapReduce
• Mapreduce Overview
• Example: WordCount
• Mappers
• Reducers

5.      Hadoop Clusters
• Hadoop
Cluster Overview
• Hadoop Jobs and Tasks

Chapter - 3

1.      Review

2.      Writing a MapReduce Program in Java
• Basic MapReduce API Concepts
• Writing a MapReduce Program in Java
• Speeding up Hadoop Development by Using Eclipse
• Differences Between the Old and New MapReduce APIs

3.       Writing a MapReduce Program Using Streaming
• Writing Mappers and Reducers with the Streaming API

4.      Unit Testing MapReduce Programs
• Unit testing
• The JUnit and MRUnit testing frameworks
• Writing Unit Tests with MRUnit
• Running Unit Tests

Chapter - 4

1.      Review

2.      Delving Deeper into the Hadoop API
• Using the ToolRunner Class
• Setting Up and Tearing Down Mappers and Reducers
• Decreasing the Amount of Intermediate Data with Combiners
• Accessing HDFS programmatically
• Using the Distributed Cache
• Using the Hadoop API’s Library of Mappers, Reducers and Partitioners

Chapter - 5

1.      Review

2.      Practical Development Tips and Techniques
• Strategies for Debugging MapReduce Code
• Testing MapReduce Code Locally Using LocalJobRunner
• Writing and Viewing Log Files
• Retrieving Job Information with Counters
• Reusing Objects
• Creating Map-only MapReduce Jobs

Chapter – 6

1.      Review

2.      Partitioners and Reducers
• How Partitioners and Reducers Work Together
• Determining the Optimal Number of Reducers for a Job
• Writing Custom Partitioners

3.      Data Input and Output
• Custom Writable and WritableComparable Implementations
• Saving binary data using SequenceFiles and Avro data files
• Issues to Consider When Using File Compression

Chapter – 7

1.      Review

2.      Common MapReduce
• Sorting and Searching Large Data Sets
• Indexing Data
• Computing Term Frequency – Inverse Document Frequency (TF-IDF)
• Calculating Word Co-Occurrence
• Performing a Secondary Sort

3.      Joining Data Sets in MapReduce Jobs
• Writing a Map-Side Join
• Writing a Reduce-Side Join

Chapter – 8

1.      Review

2.      Hadoop Tools for Data Acquisition
• Loading Data from an RDBMS into HDFS by Using Sqoop
• Managing Real-Time Data Using Flume

3.      Creating Workflows with Oozie
• Introduction to Oozie
• Creating Oozie Workflows

4.      Introduction to Pig
• What is Pig?
• Pig’s Features
• Pig Use Cases
• Interacting with Pig

Chapter – 9

1.      A Brief Review
• Hadoop Review
• Pig Review

2.      Basic Data Analysis with Pig
• Pig Latin Syntax
• Loading Data
• Simple Data Types
• Field Definitions
• Data Output
• Viewing the Schema
• Filtering and Sorting Data
• Commonly-used Functions

3.      Processing Complex Data with Pig
• Storage Formats
• Complex/Nested Data Types
• Grouping
• Built-in Functions for complex Data
• Iterating Grouped Data

Chapter – 10

1.      Review

2.      Multi-Dataset Operations with Pig
• Techniques for Combining Data Sets
• Joining Data Sets in Pig
• Set Operations
• Splitting Data Sets

3.      Extending Pig
• Adding Flexibility with Parameters
• Macros and Imports
• UDFs
• Contributed Functions
• Using Other Languages to Process Data with Pig

4.      Pig Troubleshooting and Optimization
• Troubleshooting Pig
• Logging
• Using Hadoop’s Web UI
• Data Sampling and Debugging
• Performance Overview
• Understanding the Execution Plan
• Tips for Improving the Performance of your Pig Jobs

Chapter – 11

1.      Review

2.      Introduction to Hive
• What Is Hive?
• Hive Schema and Data Storage
• Comparing Hive to Traditional Databases
• Hive Use Cases
• Interacting with Hive

3.      Relational Data Analysis with Hive
• Hive Databases and Tables
• Basic HiveQL Syntax
• Data Types
• Joining Datasets
• Common Built-in Functions

4.      Hive Data Management
• Hive Data Formats
• Creating Databases and Hive-Managed Tables
• Loading Data into Hive
• Altering Databases and Tables
• Self-Managed Tables
• Simplifying Queries with Views
• Storing Query Results
• Controlling Access to Data

Chapter - 12

1.      Review

2.      Text Processing with Hive
• Overview of Text Processing
• Important String Functions
• Using Regular Expressions in Hive
• Sentiment Analysis and n-grams

3.      Hive Optimization
• Understanding Query Performance
• Controlling Job Execution
• Partitioning
• Bucketing
• Indexing Data

4.      Extending Hive
• SerDes
• Data Transformation with Custom Scripts
• User-Defined Functions
• Parameterized Queries

Chapter – 13

1.      Review

2.      Introduction to Impala
• What is Impala?
• How Impala differs from Hive and Pig
• How Impala differs from Relational Databases
• Limitations and Future Directions
• Using the Impala shell

3.      Analyzing Data with Impala
• Basic Syntax
• Data Types
• Filtering, Sorting and Limiting Results
• Joining and Grouping Data
• User-Defined Functions
• Improving Impala Performance

4.      Conclusion: Choosing the Best Tool for the Job
• Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
• Which to Choose?

 

Inquiry Now
First Name *
Error Message
last Name *
Error Message
Mobile *
Error Message
Fax
City
Centre
Country
Url
Email *
Error Message
Purpose
Comment

The CAPTCHA image


http://www.androidtrainingahmedabad.com @ 2017   Privacy Policy |   Terms Of Use |   Legal
We are one of the best IT training and certification center in Ahmedabad offering CCNA and CCNP Training of CISCO Certificate, Providing training for MCSA (Microsoft) for Server Administrator, Oracle OCP Certificate, Java Certification Training, Android Development, BigData and Hadoop Certification, VMWare Certification, AWS Amazon Certification, Certified Professional Ethical Hacker certificate from Mile2, College Project Training with Quality Training at no Extra Cost. Besides we are authorized Training center for A+, Microsoft, RedHat Linux, and many more certifications in the field of Hardware & Networking.