Then each mapper creates connection with the database using JDBC and fetches the part of data assigned by Sqoop and writes it into HDFS or Hive or HBase based on the arguments provided in the CLI.

You can control the number of mappers independently from the number of files present in the directory.

Now, as we know that Apache Flume is a data ingestion tool for unstructured sources, but organisations store their operational data in relational databases.

So, there was a need of tool which can import and export data from relational databases. Sqoop can easily integrate with Hadoop and dump structured data from relational databases on HDFS, complimenting the power of Hadoop.

Here, Apache Sqoop plays an important role in (Hadoop storage) and relational database servers like mysql, Oracle RDB, SQLite, Teradata, Netezza, Postgres etc.

Apache Sqoop imports data from relational databases to HDFS, and exports data from HDFS to relational databases.

This is where Apache Sqoop comes to rescue and removes their pain.

These chunks are exported to a structured data destination.

It efficiently transfers bulk data between Hadoop and external datastores such as enterprise data warehouses, relational databases, etc.

code for importing and exporting data from relational database to HDFS is uninteresting & tedious.

