I recently needed to step through some handwritten assembly on my MacBook, and
found the setup extremely quirky. There’s little documentation, so
I had to read through endless man
pages and Stack Overflow discussions
to make things work. I’m documenting the caveats for my future self and
for anyone who wants to hack on assembly on their Macs.
NOTE: This guide was tested on macOS 14.1.1 running on an Intel CPU. Procedure (as well as assembly) will be entirely different on Macs with Apple CPUs.
TL;DR See the article repository
Installing the Necessary Tools
You’ll have to install the Xcode command-line tools by running xcode-select --install
in your terminal. It’s entirely possible that this package is already
installed on your machine, since lots of programs like Homebrew require it to
work.
Writing macOS Assembly
I’ll use this ‘hello-world’ program to demonstrate some of macOS’s quirks
.intel_syntax noprefix
.section __DATA,__data # .data
message: .asciz "Hello, world!\n"
.set message_size, . - message
.section __TEXT,__text # .text
.global entry
entry:
# write(1, message, message_size)
mov rax, 0x02000004
mov rdi, 1
lea rsi, message[rip]
mov rdx, offset message_size
syscall
# exit(0)
mov rax, 0x02000001
mov rdi, 0
syscall
.global _main
_main:
call entry
First things first, if you prefer AT&T syntax, then I suggest contacting a
certified psychiatrist in your area. For anyone else, the correct syntax flavor
can be selected using the .intel_syntax
directive. noprefix
argument allows
using registers without the %
prefix.
Mach-O expects different section names than Linux ELF. .text
is __TEXT,__text
and .data
is __DATA,__data
. The different section names are documented
here.
You could also just use the .text
and .data
directives provided by the
assembler if you don’t want to specify full section names.
Recent macOS versions disabled support for 32-bit executables, so you’ll need to
write position-independent code. Note the lea rsi, message[rip]
instruction. If
you instead try loading the the message addresss using mov
you’ll get an error
saying that 32-bit absolute addressing is no longer supported.
Program execution will begin at _main
. This is defined either by the
macOS SDK or the macOS libc, but do your own research. I’ll also address the entry
subroutine in a minute.
Building
Here are the necessary build commands in a sample Makefile
.PHONY: clean
all: main
clean:
rm -rf main.o main main.dSYM
main.o: main.s
as -g -o $@ $?
main: main.o
ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -o $@ $?
main.dSYM: main
dsymutil $?
I used the -g
assembler flag to generate debug symbols in the object file.
The most annoying part is linking. Apparently, someone at Apple thought that it was
not a good idea to add the most core OS library (-lSystem
) into the
linker’s default search path, so you’ll need to add it manually using the -L
flag.
macOS made a peculiar design choice for debugging symbols. Instead of putting
those directly in the executable, they are put into a dSYM
companion file. This file
can be generated using the dsymutil
tool.
Debugging
LLDB is the system debugger on macOS. It’s very similar to GDB, but its commands are better structured. It’s very well documented here.
Once again, if you want to use the correct assembly syntax, you’ll need
to run settings set target.x86-disassembly-flavor intel
. You can either do that
interactively, or you could write it in .lldbinit
.
You can either save this file in current working directory and load it using
--local-lldbinit
or save it under ~
and it will get loaded automatically when you enter the debugger.
I’ve noticed that if you set your breakpoint at _main
, then LLDB will skip over it
entirely. I’m not sure why this happens, but I assume that LLDB is configured to skip
over system library code by default. The obvious workaround is to
create a separate subroutine and call it from _main
. Also, I tried to use
main
as the subroutine name, but LLDB complained that breakpoint was ambiguous.
I presume this symbol is defined by libc
or some other core library.