Kernel Safari #4: The Source Tree of the Knowledge of Good and Evil
Note:
Treat this post as a map. Don’t bother reading or memorizing all of this at once. Just skim through and find something interesting - it’s about making things more approachable, not forcing them down your throat
Hiya! It’s really good to be back. I took a small hiatus from the blog for a couple weeks. I spent that time primarily learning about some of the cool stuff that happens outside the kernel realm. I believe that too much specialization can do to a brain what an undiversified diet does to a body. But now I’m back on track!
Today we’re going to talk about the kernel source directory and how to find your way around it. We’ll cover the basic purpose of every top-level directory and talk more in-depth about some of them. Behold!
-rw-r--r-- 1 drozdziak1 drozdziak1 18693 2016-09-07 COPYING
-rw-r--r-- 1 drozdziak1 drozdziak1 98465 07-27 07:41 CREDITS
drwxr-xr-x 121 drozdziak1 drozdziak1 12288 07-27 07:41 Documentation
-rw-r--r-- 1 drozdziak1 drozdziak1 2258 07-27 07:41 Kbuild
-rw-r--r-- 1 drozdziak1 drozdziak1 252 2016-09-07 Kconfig
-rw-r--r-- 1 drozdziak1 drozdziak1 420362 07-27 07:41 MAINTAINERS
-rw-r--r-- 1 drozdziak1 drozdziak1 60210 07-27 07:41 Makefile
-rw-r--r-- 1 drozdziak1 drozdziak1 722 02-27 10:30 README
drwxr-xr-x 32 drozdziak1 drozdziak1 4096 07-27 09:16 arch
drwxr-xr-x 3 drozdziak1 drozdziak1 4096 07-27 09:16 block
drwxr-xr-x 2 drozdziak1 drozdziak1 4096 07-27 09:16 certs
drwxr-xr-x 4 drozdziak1 drozdziak1 12288 07-27 09:16 crypto
drwxr-xr-x 132 drozdziak1 drozdziak1 4096 07-27 09:16 drivers
drwxr-xr-x 36 drozdziak1 drozdziak1 4096 07-27 09:16 firmware
drwxr-xr-x 74 drozdziak1 drozdziak1 12288 07-27 09:16 fs
drwxr-xr-x 28 drozdziak1 drozdziak1 4096 07-27 09:16 include
drwxr-xr-x 2 drozdziak1 drozdziak1 4096 07-27 09:16 init
drwxr-xr-x 2 drozdziak1 drozdziak1 4096 07-27 09:16 ipc
drwxr-xr-x 17 drozdziak1 drozdziak1 12288 07-27 09:16 kernel
drwxr-xr-x 12 drozdziak1 drozdziak1 20480 07-27 09:16 lib
drwxr-xr-x 3 drozdziak1 drozdziak1 12288 07-27 09:16 mm
drwxr-xr-x 69 drozdziak1 drozdziak1 4096 07-27 09:16 net
drwxr-xr-x 27 drozdziak1 drozdziak1 4096 07-27 07:41 samples
drwxr-xr-x 14 drozdziak1 drozdziak1 4096 07-27 09:16 scripts
drwxr-xr-x 10 drozdziak1 drozdziak1 4096 07-27 09:16 security
drwxr-xr-x 24 drozdziak1 drozdziak1 4096 07-27 09:16 sound
drwxr-xr-x 31 drozdziak1 drozdziak1 4096 07-27 07:41 tools
drwxr-xr-x 2 drozdziak1 drozdziak1 4096 07-27 09:16 usr
drwxr-xr-x 4 drozdziak1 drozdziak1 4096 07-27 09:16 virt
Documentation/
It’s hard to miss this one when going through the sources, Documentation/
houses the better part of Linux docs. Its topics span from development
environment tips, the kernel development
process
and patch exchange rules, all the way to the intricacies of how the actual code
works. As far as docs viewing goes, the modern approach is to use the
sphinx-generated documentation available under the *docs
Make targets (see
make help | grep docs
for more details) or one of the hosted instances like
https://www.kernel.org/doc/html/latest.
To view a freshly compiled batch, see the Documentation/output
directory in
your source tree.
arch/
arch/
is responsible for all things architecture-specific. Also, whenever you
build a kernel, it’s where you’ll most likely find the final kernel image
resulting from your build, e.g. arch/x86/boot/bzImage
for a typical x86
defconfig
build.
block/
This is the home of the Linux block layer and the related generic implementations of block manipulation, I/O handling, scheduling, prioritization, the relevant ioctl() requests etc.
certs/
certs/
holds the code responsible for module
signing
- a safety feature that lets the kernel verify authenticity of modules.
crypto/
crypto/
is the home of the kernel’s cryptographic API, which consists of
different cipher implementations. Hardware-accelerated solutions can also use
crypto/
’s common
interface.
Note:
The kernel crypto API also features algorithms other than ciphers, e.g. compression algs.
drivers/
Probably the most significant directory in the whole project, if not only the biggest (nearly half of the whole codebase). This is where all the hardware/software chat is going, with lower-half drivers talking to their upper-half counterparts and where the magic of hardware abstraction happens.
firmware/
firmware/
is just a big bag of firmware blobs, one of the few places where
there is no human-readable source code in the kernel codebase. If you understand
what loadable firmware is about, feel free to scroll to the next dir. If not,
prepare to learn a thing or two
Imagine you have a USB stick - be it an LTE modem, a WiFi card or a DVB tuner. Your device has a couple on-board chips, among which there’s a write-protected flash die for storing the device’s firmware. Nothing unusual about that, huh?
But! What if your developers make a mistake or discover a security hole and thousands of devices are to become vulnerable, with no other way out than discarding them?
But! What if a wireless technology which your device works with grows very fast and receives frequent updates? What if those would normally force you to release new versions of your hardware more often than you can afford?
But! What if you’re using a protocol that has its frequencies and signal strength regulated in different countries? Would all of those regulations stay the same forever? Could you afford a flash so big that all the configs fit on a single device?
What if you could design your hardware so that the OS can load a firmware on-demand? To answer the questions above, many manufacturers go even further and choose to make it so that the device’s memory is volatile and it’s the OS’s job to find and load the correct firmware image onto the device, while the hardware in itself only provides the basic mechanisms for firmware loading.
But why does firmware have its source closed?
Open-source software is all nice and dandy, but things can look different in hardware manufacturing - source code can reveal how your hardware works, which sooner or later will give you cheap knock-offs from the competition and unsolicited reverse engineering of your product.
Given all that, Linux doesn’t accept new firmware blobs into the source tree anymore. If you have to use one, you’ll usually have to supply it yourself and keep it in the userspace or build it into Linux at compile-time.
fs/
In fs/
you’ll find the implementation of all the different filesystems that
Linux supports. Some of them are not tied to real drives (they’re called
pseudofilesystems, e.g. devtmpfs
used for /dev/
or sysfs
used for
/sys/
) and exist purely in RAM. But pseudo- or not, their common denominator
is the interface known as VFS (Virtual File System) - the part of Linux that
lets you browse files on different partitions as if they were a part of a single
hierarchy.
include/
Headers - And lots of them. If you’ve seen enough C/C++ projects, you should roughly know what to look for in here. The public headers expose the APIs for interaction with the kernel both from the inside and userspace.
init/
Generic kernel startup code (the platform-specific stuff lies in
arch/<your_architecture>
). This is where the kernel startup routines live,
like startup_kernel()
- the kernel function that takes over after your machine
is done with architecture-specific provisioning.
ipc/
ipc/
is where inter-process communication code lives (what a twist! ).
Pipes, shared memory, message queues is what you’ll find there.
kernel/
This is the home of all things that make Linux a real kernel:
- process scheduling, prioritization
- cgroups
- synchronization primitives
- high-level power management
- timekeeping
- debugging and profiling (of both the kernel and user programs)
- logging
- error handling
- module loading
- user and groups permissions management
- stuff specific to multicore machines
…and more!
kernel/
has the potential to give you a great deal of insight about how things
are done under the hood.
lib/
Helper functions - kernels in general have no use for the standard library.
Linux is no different here and it had to develop some functions of its own. If
you’re looking for a generic implementation of a common operation, lib/
is the
place to go. Things like string manipulation
functions,
hash function
implementations
or compression
algorithms can
be found inside. Some ciphers and compression algorithms hooked up to the crypto
API (crypto/
) have their logic implemented here.
An interesting example of an algorithm from lib/
are red-black trees, which
are a common data structure used in different process schedulers,
kmemleak and
more!
mm/
Memory management - once you understand the acronym, mm/
’s contents are no
longer a mystery. This directory holds the code for different memory allocators,
paging implementation, swap implementation, memory sharing mechanisms, memory
compression, talking to backing devices, DMA etc.
Note:
mm/
is also where the Dirty COW vulnerability was discovered.
net/
Networking - every network protocol supported by Linux is kept here. But apart from that, there’s also the firewall infrastructure, the UNIX sockets implementation, DNS cache, network statistics etc.
samples/
Various code samples useful for testing different kernel APIs, like:
- HID
- Hardware breakpoints
- The kernel debugger
- Video
- The packet filter format
- The watchdog
- Various system calls
scripts/
Helper scripts for making the work around the project a little easier on the developer. Prominent examples include:
-
checkpatch.pl
- a nagging friend of every Linux kernel developer, its purpose is to find obvious patch bloopers like coding style violations or the commit message format in patches or changes to the repo tracked by git -
get_maintainer.pl
- a helper script for finding the maintainers and mailing lists connected to the specified file(s). When you find a bug or add code which you’d like to send a patch for, it’ll tell you where to send it and who to CC -
coccicheck
- a Coccinelle script for semantic analysis of the kernel sources.
security/
Security modules - this directory contains the different means for OS hardening that Linux supports, including SELinux, AppArmor and others.
sound/
The home of all the major sound systems supported by Linux, including ALSA, OSS and Jack.
tools/
Userland helper tools and test programs - this directory is very similar to
samples/
but its contents are more focused on usable solutions than examples
of how the underlying kernel code works.
usr/
You may remember this directory from the time when we generated a kernel image for KVM. For some reason, it was decided that only the tool for generating initcpio’s should live here.
virt/
Host-side KVM implementation.
Where them syscalls/namespaces at?
Some parts of Linux don’t have a centralized location in the codebase, with system calls and namespaces being two notable examples.
You probably know what system calls are, but in case you don’t, they’re
basically a collection of special procedures that userspace programs use to make
each and every request from the operating system. There’s about 300 of them, and
to call one, your Linux program sets a designated register to the system call
number and then specifies the call’s arguments according to the architecture’s
system call calling
convention (see the
“Architecture calling conventions” section of man 2 syscall
for more
information). Finally, the process requests a context switch to kernelspace,
e.g. in x86 by invoking software interrupt No. 128:
movl eax, 4 ; Choose the write() system call number
movl ebx, 1 ; Choose file descriptor 1 a.k.a. stdout
movl ecx, 0xdeadbeef ; Choose a pointer to a string at address 0xdeadbeef
movl edx, 12 ; Specify the string's length as 12 bytes
int 0x80 ; Invoke interrupt 128, the kernel takes over now
write(1, “some message”, 12)
Because they appeal to many different subsystems, syscalls are defined all over the kernel sources using a special set of macros which combine them into a list that pairs each one up with a syscall number.
Note:
If you’re curious about the system calls used by your program, check out
strace
- a tool for dumping all syscalls used by your application as they’re called. The program is especially great for reverse engineering of closed-source applications and low-level debugging.
Namespaces together with cgroups form a foundation for Linux OS-level virtualization (containers). Namespaces work by exposing only the selected resources to a process that we want to isolate. To date, the namespace-ified subsystems include:
Namespace Constant Isolates
Cgroup CLONE_NEWCGROUP Cgroup root directory
IPC CLONE_NEWIPC System V IPC, POSIX message queues
Network CLONE_NEWNET Network devices, stacks, ports, etc.
Mount CLONE_NEWNS Mount points
PID CLONE_NEWPID Process IDs
User CLONE_NEWUSER User and group IDs
UTS CLONE_NEWUTS Hostname and NIS domain name
See man 7 namespaces
for more details
Conclusion
This post required a lot of effort, but here it is! Working on this was real fun and it also gave me a reason to learn more about the source tree myself. For the next post, I’m planning to do some bug hunting in the kernel and write up a little report.