Another technology which is being used in big data analytics is Apache HBase. It is a column oriented key value data store. Hadoop and HBase built to run on top of a HDFS Hadoop distributed file system. The prerequisite for HBase. In this case, Hadoop distributed file system and it will run on top of it as HDFS. The key difference between HBase and other kind of data sources. HBase is a column oriented key value base data store. While if you're going for relational databases then they'll be reoriented.
Storage Mechanism in HBase:
How the storage mechanism is done in HBase? HBase is a column-oriented database. In which the data are sorted by it all. Then we have a table schema which defines what are the column families and what are the key value pairs. It also provides you with a multiple column family in HBase and the column families. They can have many numbers of columns. It provides you with a key value pair. Then the subsequent or the column values. They are stored continuously on the disk in case of HBase.
Hadoop and HBase Architecture:
Comparison of HBase and RDBMS:
Comparison of big data analytics tools. Let's look at what is a difference between HBase and a traditional RDBMS. HBase is a schema-less database. In the HBase, we do not have any fixed schema. Like, we have in RDBMS. In RDBMS, that is relational databases. The schema It defines the whole structure of your database and the columns are also fixed. But in case of HBase. There is no concept of fixed column schema. The columns are not fixed and it is therefore schema list. The one of the major differences that HBase is a schema less architecture. RDBMS has the wall architecture based on the schema. HBase is built for white tables. RDBMS is way in and built for small tables. In case of HBase, it is horizontally scalable. But in RDBMS, it is not scalable.
Horizontal Scalability:
What is horizontal scalability? Horizontal scalability basically means HBase is used on top of Hadoop ecosystem. You are going to use commodity hardware. For example, if you have to run 4G data set. That data set is being processed by a single machine. You have a requirement for processing of a much larger data set. What you can do using Hadoop? You can add in a new machine which will handle this dataset. The processing capability which is being offered to handle large datasets is increasing in a seamless manner. The user is not aware, this is called horizontal scalability. When the requirement for processing is all storage is going up, the processing requirement or whatever, the infrastructure is being required. It is going it is expanding.
Vertical Scalability:
In case of vertical scalability. What happens is that? If you need better processing capabilities. What you need to do? You need to upgrade your existing system. You need to buy a new system altogether. Of course, you have to stop the execution, then update the system. Again, process the data. But everything of this kind is taken care by the HBase. What it does? When you need more, you increase your requirement for data set. You did the processing and the storage capability will also increase in a horizontal manner. You are not aware of that the fact that you have just added one more machine.
Advantages of Hadoop and HBase:
Another difference is that the data set which is being used. In case of RDBMS is generally transnational. Why this is not the case of HBase. Generally, works on the data which is already stored. Another advantage is that whenever you're going to store the HBase. It is not based on certain fixed schema. Therefore, no need to normalize the data. But if you're going to store your data in a traditional RDBMS such as relational data and database management system. In that case, you have to store the data in a normalized manner. Another advantage, which is one of the key advantages that the RDBMS, it is good for structured data set. Because it has been developed for structured datasets only. It is good for handling structure dataset. HBase is good for handling, it can handle structured as well as structured data set.
Data Handling:
It is not restricted to handling only structured data. It can handle any kind of data. Whether, it is structured or unstructured. Because there is no fixed schema. We also have a concept called Zookeeper, which has certain clines and then we have HMaster also. Zookeeper, it tracks keeps track of registration and the location of the nodes. The different nodes which are being used in cloud systems in HBase. It takes care of the memory storage. These storage's are done on the basis of regions. The entire nodes are divided into different regions. You have one root node and then you have table bar. The latency rate is very low. If we have random read and write on top of a HDFS. This is provided by HBase.
The tables which are being used in HBase architecture are dynamically distributed by the system. Whenever, they become too large to handle In HBase. It is handed over to the next table. If these tables size is increasing. It will split and that table will be handed over by another table, which is being created depending upon the different regions of the HBase. This was about the architecture of the HBase. (Hadoop and HBase)