Thursday, August 13, 2015

Use sqoop to import mysql into hadoop on Mac

reference guide.

1. prerequisites:
1.1 Hadoop, Sqoop version.
$ hadoop version
Hadoop 2.5.2

Sqoop:  sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

download mysql-connector-java-5.1.36.jar and put in sqoop/lib folder :  http://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.36

1.2 edit hadoop/etc/hadoop/yarn.site.xml
<configuration>
  <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>

    <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
       <name>yarn.application.classpath</name>
       <value>
           %HADOOP_HOME%/etc/hadoop,
           %HADOOP_HOME%/share/hadoop/common/*,
           %HADOOP_HOME%/share/hadoop/common/lib/*,
           %HADOOP_HOME%/share/hadoop/hdfs/*,
           %HADOOP_HOME%/share/hadoop/hdfs/lib/*,
           %HADOOP_HOME%/share/hadoop/mapreduce/*,
           %HADOOP_HOME%/share/hadoop/mapreduce/lib/*,
           %HADOOP_HOME%/share/hadoop/yarn/*,
           %HADOOP_HOME%/share/hadoop/yarn/lib/*
       </value>
    </property>

</configuration>

1.3 link missed executable binary:
sudo ln -s /usr/bin/java /bin/java
sudo ln -s /usr/local/bin/mysqldump /usr/bin/mysqldump

1.4 run import command under Sqoop folder:
./bin/sqoop import --connect jdbc:mysql://127.0.0.1:3306/sample --username root --table employers --direct -m 1 --target-dir /user/YOUR_NAME/employers

1.4.1 or use import config file:
./bin/sqoop --options-file /usr/local/sqoop/import.txt --table employers -m 1 --target-dir /user/YOUR_NAME/sample/employers --check-column id --incremental append 
import.txt:
----------------
import
--connect
jdbc:mysql://127.0.0.1/db
--username
root

1.4.2   --fields-terminated-by '|'
check "Output line formatting arguments" for more format usage

1.5 incremental import based on column value or modify date.

./bin/sqoop import --connect jdbc:mysql://127.0.0.1:3306/sample --username root --table employers --direct -m 1 --target-dir /user/YOUR_NAME/employers --check-column id --incremental append --last-value 118996

No comments:

Post a Comment