ambariCloud | Training

Big Data/Hadoop - Corporate Training

We provide the most comprehensive technical training for highly motivated individuals and corporations with our wealth of experience in Big Data technologies. Training offerings in Scala, Python and Hadoop with hands-on practice in programming and implementation with the most popular and useful cloud technologies like AWS and more. Along the way, students will have assistance in preparing for the job search.

Courses include:

Introductory to Python and Scala
Big Data with Hadoop
Big Data with Spark.

We offer a training for variety of certificate which include

“HDP Certified Administrator (HDPCA)” certification
“HDP Certified Developer (HDPCD)” certification
“HDP Certified Spark Developer (HDPCD:SPARK)” certification.

Hadoop Training

Curriculum (80 Hours)

Hadoop Architecture
Build production like Hadoop Cluster in Amazon EC2 cloud – focus on Apache distribution
YARN
HIVE
SCOOP
PIG
Excel and HIVE Integration
HBASE
PIG and HIVE integration with Hbase
Hands-on training on real projects
Preparation for certification exam (HDPCD)
SPARK SQL – Building analytics on HDFS, HIVE and HBASE

What is Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.
It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
This course enables participants to build complete, unified Big Data applications combining batch and interactive analytics on Hadoop platform.

Hadoop Advance Training

Curriculum (40 Hours)

End-to-end Integration

Build production like Hadoop Cluster in Amazon EC2 cloud – focus on Apache distribution
YARN
HIVE
SCOOP
PIG

Design and build project

Prerequisites:

Strong Java / Scala
ambariCloud Hadoop Solution Architecture Class
MapReduce
Eclipse IDE

What is Lambda Architecture?

Batch work flows are too slow; data in batch view is stale and cannot be used NOW!—these are common concerns to many architects who have implemented Hadoop.
Organizations want (near) real time data to perform accurate decisions, on massive amount of data to gain competitive advantage. Era of batch processing is over!!!
The Lambda Architecture is an approach to building stream processing applications on top of Hadoop using Spark, Kafka and NiFi.
In this advanced course you will deep-dive into using real time data ingest tools such as Flume, Spark, Kafka, HBASE etc. and end-to-end integration of these tools.
>> Learn about tool selection
>> Learn about integration
>> Design and Build end-to-end project

Spark Training

Curriculum (50 Hours)

AWS AMI

AMI setup
Spark, Kafka, Hbase, Hadoop, Nifi, Zeppline

PyCharm setup

Python basics
How to build application and run in AMI

Intro to Spark

Why Spark

Scala programming –Spark

Employ functional programming practices
Explain the purpose and function of RDDs
Perform Spark transformations and actions
- Map, flatmap
- DataFrame
- DataSet

Explore and manipulate data using a Spark REPL

Spark Shell

Explore and manipulate data using Zeppelin

Interpreters
Run SQLs and build reports

Hands on Practice

Read: Local, HDFS files
- JSON
- CSV
- Parquet files
- XML
DataFrame
- API
- Window functions
- Pivoting
DatFrame from HIVE tables
Data Frame from JDBC
- Join Hive, JDBC tables and HDFS files
Store data
- Parquet, JSON, JDBC, HBASE etc

Hbase

Read and store Hbase data from spark
- Join multiple Hbase tables in spark
- Join many source tables ( from Hbase, hive, hdfs, mysql) and Join

Spark infrastructure architecture

Nodes architecture – Driver, workers

Kafka and Spark streaming

Kafka architecture
NIFi to load data to Kafka
Read data from Kafka to Spark.
- Store Stream data to Hbase, Hadoop, mySQL
Lambda architecture.
- Combine batch and Stream data
Streaming data from RasberryPI to Hbase, ElasticSearch and MYSQL via Nifi, Kafka and Spark

Integration with Elastic Search (ES)

Streaming data from Nifi=>kafka=>spark=>to ES

Introduction to Spark Machine Learning

Spark new ML library -- Pipeline model
- Label, features
Basic stats
- Mean, max, mode, correlation
Liner regression
Kmeans

What is Apache Spark?

Apache Spark™ is a lightning-fast cluster computing, open-source processing engine to process huge amounts of data in short time. Spark is optimized for speed, ease of use, and advanced analytics.
The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional “Hadoop Map Reduce” programs.
This course enables participants to build complete, unified Big Data applications combining batch, streaming, and interactive analytics on Spark platform.
With Spark, developers can write sophisticated applications for faster business decisions and better user outcomes, applied to a wide variety of use cases, architectures, and industries.

Falcon and Oozie Training

Curriculum (15 Hours)

Falcon
Oozie

Need experience in:

ambariCloud Hadoop Solution Architecture Class
PIG
Hive
XML

What is Falcon & Oozie?

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.
As the demand for big data continues to grow, big data governance has become a critical issue for most organizations.
Big data often deals with sensitive personal information and confidential enterprise records, and big data governance to ensure the security of this information is paramount.
Superior governance can enable organizations to avoid the costs associated with low quality data re-work, and to provide big data reporting in compliance with government regulations like Sarbanes-Oxley, HIPAA, and Basel II/Basel III.

In this training developers learn about data replication, job scheduling etc.

Spark and DataScientist Type - B

Curriculum

Python –

Python modules, Classes, Functional Programming
Advanced Spark -
Spark SQL, Zeppelin
Spark Streaming
Spark Data Frame

Machine Learning-

Linear Regression
K-Means
Classification
Recommendation

Prerequisites

Basic Python
Hive, SQL, Linux

Spark and Data Scientist B

Spark Engineer: Spark™ is a lightning-fast cluster computing, open-source processing engine to process huge amounts of data in short time. Spark is optimized for speed, ease of use, and advanced analytics. The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional “Hadoop Map Reduce” programs.
Type B Data Scientist: The B in Type B Data Scientist refers to building models. Type B Data Scientists predict the unknown, by asking questions from different perspectives of the business, writing complex algorithms and developing statistical models on structure / unstructured data in BigData domain.
Some of the important skills and tools for Type B Data Scientists include - expertise in Python/Scala, Hadoop, Data Analysis, NoSQL, Machine Learning, and Software Development.

NiFi Training

Curriculum (16 Hours):

NiFi Architecture
Develop data migration Processors
Convert migration process to Templates
IOT -- nifiMini
Prerequisites:

ambariCloud Hadoop Solution Architecture Class

What is Apache NiFi?

Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of dataflow.

The NiFi data-flow orchestration tool, drafted in as part of the NSA's duty "to respond to foreign-intelligence requirements", now finds itself on the front line of “Internet of Things” technology, according to Hortonworks CTO Scott Gnau.

"Instead of a one-way traditional streaming or data flow, it's bidirectional and point to point. That's a really big difference technologically and from a requirements perspective."

In this training developer learn simplified batch/real time data ingestion to Hadoop from various internal sources such as Oracle, HANA, or any external sources such as Tweeter etc.

Kerberos Training

Curriculum (16 Hours):

Hadoop Security
Kerberos 101
KDC Server/Client Installation and Configuration
Kerberos Encryption types
Kerberos Operations
Kerberos Troubleshooting
Kerberos setup in Hadoop
Kerberos configuration in Hadoop echo system (Hive /Pig /Oozie/ )
Prerequisites:

ambariCloud Hadoop Solution Architecture Class

Kerberos for Hadoop

Security became real concern for critical enterprise Hadoop IT services.
To implement Hadoop authentication, Kerberos is de facto standard and very critical skills required to manage secured Hadoop projects.
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop authentication security with Kerberos.