Big Data Training Series
Practical Guide to Big Data Analytics with Pig Latin, Hive and Scilab
Start your Big Data journey by examples using Open Source tools!
“A must-attend training. You will acquire the necessary skills to harness large data set for valuable information”
Course Synopsis
Big data is a term for data or data sets that are very large and/or complex in a way that conventional data processing are unable to handle them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy/security. The term "big data" often refers simply to the use of predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.
Data analytics is now a priority for any organisations to identify market opportunities for services and products. It is found that more than 77% of top organisations consider data analytics a critical component of the business performance.
Who Should Attend
This training is suitable for participants (engineers, scientists, data analysts, academician, researchers and alike) and anyone who would like to understand on how to deal with big data with the available open source tools.
Human Resource Development Fund (HRDF)
Our courses may be submitted to HRDF for SBL claims. Kindly check with your Human Resource Department or Training Unit. Alternatively, we could also assist you in your application. Call us now to enquire!
Course Outline
Apache Hadoop
Apache Hadoop is an open-source software framework used for distributed storage and processing of very large data sets while Apache Ambari is aimed at making Hadoop management simpler via the Ambari Viewer. In this section, you will familiarize yourself with the Ambari Viewer interface.
- Introduction to Apache Hadoop
- Getting familiar with Ambari Viewer
Learning Pig Latin Language
Apache Pig is a high level platform for creating programs to run on Apache Hadoop. It uses the Pig Latin language. In this section, you will learn the fundamentals of the Pig Latin language. This includes learning how to load data from HDFS, write commands to perform analysis and storing the results.
- Introduction to Apache Pig
- Learning the basics of Pig Latin by loading and storing data to and from HDFS, performing data processing
- Running a script inside Pig View
Learning Hive Query Language
Apache Hive is data warehouse infrastructure built on top of Hadoop. It uses Hive Query Language (HiveQL), a SQL-like language to access the data in Hadoop. In this section, you will the fundamentals of HiveQL. This includes learning how to create and load tables, sending queries as well as performing simple visualization in Hive Viewer.
- Introduction to Apache Hive
- Learning the basics of HiveQL by creating and loading a Hive table, sending queries to Hadoop
- Simple visualization inside Hive Viewer
WebHDFS, HCatalog, WebHCat
WebHDFS is a built-in component of HDFS that allows you to access HDFS via REST API. HCatalog is a table and storage management layer for Hadoop. It allows other Hadoop tools such as Pig to access the Hive metadata. WebHCAT is REST API component of HCatalog. It allows you to send Pig and Hive jobs through REST API. In this section, you will learn on how to access the HDFS via REST API. You will also learn on how to access Hive tables in Pig. You will also learn how to send Pig or Hive jobs via REST API.
- Introduction
- Accessing HDFS via HTTP
- Accessing Hive Tables via HCatalog
- Sending Pig/Hive jobs via WebHCat
Scilab
Scilab is a high level open source numerical computation software. In this section, you will learn on how to access Hadoop as well as sending Pig and Hive jobs from Scilab. You will also learn to perform data analysis and visualization inside Scilab.
- Introduction to Scilab
- Scilab Basics
- Accessing Hadoop from Scilab
- Sending Pig/Hive Jobs from Scilab
- Data Analysis using Scilab
- Visualization inside Scilab