Installing Hadoop on MacBook Apple Silicon (M1 or M2)
In this article, I'll show you how to install Hadoop on your MacBook M1 or M2.
Please confirm your MacBook meets the following requirements:
- A MacBook M1 or M2 (Apple Silicon) running macOS Ventura 13.2.1 or later
- Minimum 8 GB of RAM
Installation Steps
- Check and install Java JDK
- Enable SSH on the MacBook in system settings
- Download Hadoop from the official website
- Configure Hadoop (
JAVA_HOMEvariable,core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xmlfiles) - Format the HDFS NameNode
- Start Hadoop services
- Use the Hadoop web UI
Prerequisites: a terminal and a text or code editor (VS Code, Sublime Text, Xcode, IntelliJ IDEA).
Step 1: Check and Install Java JDK
Run the following command to check if Java is installed:
java -version
If Java is not installed, download Java SE Development Kit 8 or 11 from the official Oracle website. Version 11 is recommended for Hadoop.
If you have an incompatible version (JDK 17, 19, 20...), remove it:
cd /Library/Java/JavaVirtualMachines && ls
sudo rm -rf /Library/Java/JavaVirtualMachines/jdk-19.1.jdk
Verify again after installation:
java -version
Step 2: Enable SSH on the MacBook
Search for Sharing in Spotlight, then enable Remote Login and grant remote users access to the full disk.
Then generate an RSA SSH key pair:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Add the public key to the authorized_keys file:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Secure the permissions:
chmod 0600 ~/.ssh/id_rsa.pub
Test the SSH connection to localhost:
ssh localhost
You should see a message like Last login: Tuesday 12 September…. SSH is working. Close the connection with Ctrl + D.
Step 3: Download Hadoop
Go to the official Apache website and download the .tar.gz file for the latest stable version of Hadoop.
Extract the archive — the folder will be named hadoop-<version>. Move it to your home directory (~).
Step 4: Configure Hadoop
Environment Variables
First find your JDK path:
/usr/libexec/java_home
cd /Library/Java/JavaVirtualMachines/jdk-11.jdk && pwd
Copy the displayed path, then open your .zprofile or .zshrc file:
nano ~/.zprofile
Add the following lines (replace dimitri with your username, which you can find with cd ~ && pwd):
JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home"
export HADOOP_HOME=/Users/dimitri/hadoop-3.3.6/
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
hadoop-env.sh
Open the Hadoop environment configuration file:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Find the commented line containing JAVA_HOME (around line 54) and uncomment it by adding your path:
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home"
core-site.xml
nano $HADOOP_HOME/etc/hadoop/core-site.xml
Between the <configuration> and </configuration> tags, add (replace dimitri):
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/dimitri/hdfs/tmp/</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
hdfs-site.xml
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.data.dir</name>
<value>/Users/dimitri/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/Users/dimitri/hdfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
Step 5: Format the HDFS NameNode
hdfs namenode -format
If this command returns an error because Hadoop is already running, stop it first with stop-all.sh, then run it again.
Step 6: Start Hadoop Services
start-all.sh
Verify that all daemons are running:
jps
You should see: NameNode, DataNode, ResourceManager, NodeManager and their process IDs.
Step 7: Hadoop Web UI
Open your browser and navigate to:
http://localhost:9870
You can monitor the state of your HDFS cluster from this interface.
To stop Hadoop:
stop-all.sh
Congratulations — Hadoop is running on your MacBook M1/M2. You can now leverage the power of Apple Silicon for data processing and analysis.