Danubius International Conferences, 16th International Conference on European Integration - Realities and Perspectives
Data Block Saving Policy in the Hadoop V1 Architecture
Last modified: 2021-05-12
Abstract
The most common technology for implementing an On-Premises Data Lake architecture is provided by Apache through the Hadoop open-source framework that will be addressed in Chapter 2 in this research paper. Also, some Cloud providers have turned to the Hadoop framework to offer the Data Lake storage service. This article presents the Hadoop storage environment for implementing an On-Premises Data Lake architecture using the Hadoop V1 framework. The Hadoop V1 architecture consists of two levels HDFS and MapReduce. The HDFS level contains the following components: Node Name, Data Nodes, Secondary Name Node and the MapReduce level based also on a master/slave architecture incorporates the components: Job Trackers, Task Trackers. Hadoop storage system will be analyzed in order to highlight its advantages and disadvantages as well as to deepen some technical aspects that are part of this technology of storage and analysis of large volumes of data in there raw format.
Acknowledgments: This work is supported by the project “ANTREPRENORDOC”, in the framework of Human Resources Development Operational Programme 2014-2020, financed from the European Social Fund under the contract number 36355/23.05.2019 HRD OP /380/6/13 – SMIS Code: 123847.