Build#02Dim’s.Build

Installing Hadoop on MacBook Apple Silicon (M1 or M2)

8 min lecture

In this article, I'll show you how to install Hadoop on your MacBook M1 or M2.

Please confirm your MacBook meets the following requirements:

  • A MacBook M1 or M2 (Apple Silicon) running macOS Ventura 13.2.1 or later
  • Minimum 8 GB of RAM

Installation Steps

  1. Check and install Java JDK
  2. Enable SSH on the MacBook in system settings
  3. Download Hadoop from the official website
  4. Configure Hadoop (JAVA_HOME variable, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml files)
  5. Format the HDFS NameNode
  6. Start Hadoop services
  7. Use the Hadoop web UI

Prerequisites: a terminal and a text or code editor (VS Code, Sublime Text, Xcode, IntelliJ IDEA).


Step 1: Check and Install Java JDK

Run the following command to check if Java is installed:

java -version

If Java is not installed, download Java SE Development Kit 8 or 11 from the official Oracle website. Version 11 is recommended for Hadoop.

If you have an incompatible version (JDK 17, 19, 20...), remove it:

cd /Library/Java/JavaVirtualMachines && ls
sudo rm -rf /Library/Java/JavaVirtualMachines/jdk-19.1.jdk

Verify again after installation:

java -version

Step 2: Enable SSH on the MacBook

Search for Sharing in Spotlight, then enable Remote Login and grant remote users access to the full disk.

Then generate an RSA SSH key pair:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Add the public key to the authorized_keys file:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Secure the permissions:

chmod 0600 ~/.ssh/id_rsa.pub

Test the SSH connection to localhost:

ssh localhost

You should see a message like Last login: Tuesday 12 September…. SSH is working. Close the connection with Ctrl + D.


Step 3: Download Hadoop

Go to the official Apache website and download the .tar.gz file for the latest stable version of Hadoop.

Extract the archive — the folder will be named hadoop-<version>. Move it to your home directory (~).


Step 4: Configure Hadoop

Environment Variables

First find your JDK path:

/usr/libexec/java_home
cd /Library/Java/JavaVirtualMachines/jdk-11.jdk && pwd

Copy the displayed path, then open your .zprofile or .zshrc file:

nano ~/.zprofile

Add the following lines (replace dimitri with your username, which you can find with cd ~ && pwd):

JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home"
export HADOOP_HOME=/Users/dimitri/hadoop-3.3.6/
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

hadoop-env.sh

Open the Hadoop environment configuration file:

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Find the commented line containing JAVA_HOME (around line 54) and uncomment it by adding your path:

export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home"

core-site.xml

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Between the <configuration> and </configuration> tags, add (replace dimitri):

<property>
  <name>hadoop.tmp.dir</name>
  <value>/Users/dimitri/hdfs/tmp/</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>

hdfs-site.xml

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<property>
  <name>dfs.data.dir</name>
  <value>/Users/dimitri/hdfs/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/Users/dimitri/hdfs/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

mapred-site.xml

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

yarn-site.xml

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

Step 5: Format the HDFS NameNode

hdfs namenode -format

If this command returns an error because Hadoop is already running, stop it first with stop-all.sh, then run it again.


Step 6: Start Hadoop Services

start-all.sh

Verify that all daemons are running:

jps

You should see: NameNode, DataNode, ResourceManager, NodeManager and their process IDs.


Step 7: Hadoop Web UI

Open your browser and navigate to:

http://localhost:9870

You can monitor the state of your HDFS cluster from this interface.

To stop Hadoop:

stop-all.sh

Congratulations — Hadoop is running on your MacBook M1/M2. You can now leverage the power of Apple Silicon for data processing and analysis.

· Dim’s.Build

La newsletter qui ne perd pas ton temps.

Trois fois par mois. SaaS, Flutter, IA, build in public.

Dim’s.Build · mardi matin · ~7 min de lecture

Une lettre par semaine, sur ce que je construis et ce que j’apprends.

40% valeur · 30% build in public · 20% storytelling · 10% offres. Écrite à la main, jamais générée. Tu peux te désabonner en un clic.

Rejoins la newsletter · taux d'ouverture 62%