Installing Hadoop on MacBook Apple Silicon (M1 or M2)

In this article, I'll show you how to install Hadoop on your MacBook M1 or M2.

Please confirm your MacBook meets the following requirements:

A MacBook M1 or M2 (Apple Silicon) running macOS Ventura 13.2.1 or later
Minimum 8 GB of RAM

Installation Steps

Check and install Java JDK
Enable SSH on the MacBook in system settings
Download Hadoop from the official website
Configure Hadoop (JAVA_HOME variable, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml files)
Format the HDFS NameNode
Start Hadoop services
Use the Hadoop web UI

Prerequisites: a terminal and a text or code editor (VS Code, Sublime Text, Xcode, IntelliJ IDEA).

Step 1: Check and Install Java JDK

Run the following command to check if Java is installed:

java -version

If Java is not installed, download Java SE Development Kit 8 or 11 from the official Oracle website. Version 11 is recommended for Hadoop.

If you have an incompatible version (JDK 17, 19, 20...), remove it:

cd /Library/Java/JavaVirtualMachines && ls
sudo rm -rf /Library/Java/JavaVirtualMachines/jdk-19.1.jdk

Verify again after installation:

java -version

Step 2: Enable SSH on the MacBook

Search for Sharing in Spotlight, then enable Remote Login and grant remote users access to the full disk.

Then generate an RSA SSH key pair:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Add the public key to the authorized_keys file:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Secure the permissions:

chmod 0600 ~/.ssh/id_rsa.pub

Test the SSH connection to localhost:

ssh localhost

You should see a message like Last login: Tuesday 12 September…. SSH is working. Close the connection with Ctrl + D.

Step 3: Download Hadoop

Go to the official Apache website and download the .tar.gz file for the latest stable version of Hadoop.

Extract the archive — the folder will be named hadoop-<version>. Move it to your home directory (~).

Step 4: Configure Hadoop

Environment Variables

First find your JDK path:

/usr/libexec/java_home
cd /Library/Java/JavaVirtualMachines/jdk-11.jdk && pwd

Copy the displayed path, then open your .zprofile or .zshrc file:

nano ~/.zprofile

Add the following lines (replace dimitri with your username, which you can find with cd ~ && pwd):

JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home"
export HADOOP_HOME=/Users/dimitri/hadoop-3.3.6/
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

hadoop-env.sh

Open the Hadoop environment configuration file:

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Find the commented line containing JAVA_HOME (around line 54) and uncomment it by adding your path:

export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home"

core-site.xml

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Between the <configuration> and </configuration> tags, add (replace dimitri):

<property>
  <name>hadoop.tmp.dir</name>
  <value>/Users/dimitri/hdfs/tmp/</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>

hdfs-site.xml

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<property>
  <name>dfs.data.dir</name>
  <value>/Users/dimitri/hdfs/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/Users/dimitri/hdfs/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

mapred-site.xml

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

yarn-site.xml

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

Step 5: Format the HDFS NameNode

hdfs namenode -format

If this command returns an error because Hadoop is already running, stop it first with stop-all.sh, then run it again.

Step 6: Start Hadoop Services

start-all.sh

Verify that all daemons are running:

jps

You should see: NameNode, DataNode, ResourceManager, NodeManager and their process IDs.

Step 7: Hadoop Web UI

Open your browser and navigate to:

http://localhost:9870

You can monitor the state of your HDFS cluster from this interface.

To stop Hadoop:

stop-all.sh

Congratulations — Hadoop is running on your MacBook M1/M2. You can now leverage the power of Apple Silicon for data processing and analysis.