Framework For ETL With Hadoop Map Reduce

Jaswender Malik, Kavita



Keywords: ETL, Handler, Usage, Conclusion



Abstract: Big Data is dealt by every organization which serves large number of users. Efficiently fetching, transferring, storing, cleaning, sanitizing, querying and extracting information from Big Data is a daunting task because a single machine and the traditional algorithms can’t handle this staggering amount of data tractably. Now not all data comes in the form that can be directly processed by automated programs. Before feeding the data into huge data processing systems[1]. It is necessary to treat raw data to convert it into a consistent format. This is done using data cleaning, sanitization and transformation operations. In this paper we present a neat framework for data cleaning and transformation operation which can be integrated in existing Map Reduce (Hadoop) infrastructures. This framework can be standardized and be adopted by corporations for their Big Data processing tasks.



