XKCD #1200: Authorization

Everybody knows that allowing different applications unlimited access to each other’s data is not exactly optimal from a security point of view. While servers have enjoyed containers to isolate applications from each other, we lack a good solution for the desktop. Or do we?

There is, obviously, flatpak. Unfortunately, flatpak present itself as a “Linux application sandboxing and distribution framework”. This will not do. I already have a distribution. I’m pretty happy with it. I want to run my distribution applications in a isolated maner.

Thankfully, the sandboxing part of flatpack is actually a separate, lesser known project : bubblewrap. Let’s try to use it to secure our desktop.

Let’s get started with one of the easiest things to sandbox, a shell :

$ bwrap zsh
bwrap: execvp zsh: No such file or directory

Uh… what?

Let’s go back to what bubblewrap is doing : it’s actually creating a new, empty filesystem namespace. The keyword here is “empty”. There’s no zsh executable in it. Let’s fix this :

$ bwrap --ro-bind /usr /usr /usr/bin/zsh
bwrap: execvp /usr/bin/zsh: No such file or directory

Weird. But the following command will tell you what failed :

$ ldd /usr/bin/zsh
	linux-vdso.so.1 (0x00007fff5d189000)
	libcap.so.2 => /usr/lib/libcap.so.2 (0x00007ff55abe2000)
	libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007ff55ab6b000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007ff55aa7e000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007ff55a89c000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ff55ad24000)

Yes, shared librairies. So we need /lib64 too. To be sure, let’s also include /bin, /lib and /sbin, although they are just symlinks on my system and should not be needed. Let’s also add /etc for things like /etc/profile.d or /etc/localtime :

$ bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /lib /lib --ro-bind /lib64 /lib64 --ro-bind /sbin /sbin --ro-bind /etc /etc /usr/bin/zsh
/usr/share/zsh/scripts/newuser:5: no such file or directory: /dev/null
zsh-newuser-install:23: no such file or directory: /dev/null
zsh-newuser-install:24: no such file or directory: /dev/null
$

Yeah, /dev/null is kinda important. Many applications will want it. We could bind it (using --dev-bind), but then there’s also /dev/zero, /dev/urandom, and probably others. We could bind /dev, but that means sandboxed applications will have access to devices — this does not sounds like a good idea. Thankfully, bubblewrap has our back and have provided us with a --dev option (and a --proc option for similar woes). We also have --tmpfs for /tmp Let’s use them :

$ bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /lib /lib --ro-bind /lib64 /lib64 --ro-bind /sbin /sbin --ro-bind /etc /etc --proc /proc --dev /dev --tmpfs /tmp /usr/bin/zsh
$ ls /
bin  dev  etc  lib  lib64  proc  sbin  tmp  usr

Notice the absence of /home : we didn’t bind it, so it is not accessible. Any compromised program that is run in this shell session will be unable to access our personal data (absent any privilege escalation exploit given root access).

Good! We’re sandboxed! We’re safe!

Or are we?

$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0  21592 12736 ?        Ss   10:37   0:01 /sbin/init verbose
root           2  0.0  0.0      0     0 ?        S    10:37   0:00 [kthreadd]
root           3  0.0  0.0      0     0 ?        S    10:37   0:00 [pool_workqueue_release]
root           4  0.0  0.0      0     0 ?        I<   10:37   0:00 [kworker/R-rcu_g]
root           5  0.0  0.0      0     0 ?        I<   10:37   0:00 [kworker/R-rcu_p]
root           6  0.0  0.0      0     0 ?        I<   10:37   0:00 [kworker/R-slub_]
root           7  0.0  0.0      0     0 ?        I<   10:37   0:00 [kworker/R-netns]
root          12  0.0  0.0      0     0 ?        I<   10:37   0:00 [kworker/R-mm_pe]
...
systemd+     426  0.0  0.0  91220  8468 ?        Ssl  10:37   0:00 /usr/lib/systemd/systemd-timesyncd
avahi        438  0.0  0.0   8932  4776 ?        Ss   10:37   0:00 avahi-daemon: running [desk.local]
dbus         439  0.0  0.0   9752  4976 ?        Ss   10:37   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         441  0.0  0.0  11016  7284 ?        Ss   10:37   0:00 sshd: /usr/bin/sshd -D [listener] 0 of 10-100 startups
...
sloonz       1427  0.0  0.4 513896 70208 ?        Sl   10:38   0:13 /usr/lib/firefox/firefox -contentproc -parentBuildID 20231130105227 -prefsLen 44628 -prefMapSize 241694 -appDir /usr/lib/firefox/browser {5508672c-0163-4fa1-adeb-7f40773b136b} 3 true rdd
...

Uh oh…

$ env
...
SHELL=/bin/zsh
WORDCHARS=*?_-.[]~=&;!#$%^(){}<>
HISTSIZE=50000
I3SOCK=/run/user/1000/sway-ipc.1000.548.sock
SSH_AUTH_SOCK=/run/user/1000/ssh-agent.socket
CREDENTIALS_DIRECTORY=/run/credentials/getty@tty1.service
MEMORY_PRESSURE_WRITE=c29tZSAyMDAwMDAgMjAwMDAwMAA=
XCURSOR_SIZE=24
...
AWS_SECRET_ACCESS_KEY=[redacted]

Oh noes. Let’s sandbox harder.

$ bwrap --help
...
    --unshare-all                Unshare every namespace we support by default
    --share-net                  Retain the network namespace (can only combine with --unshare-all)
    --unshare-user               Create new user namespace (may be automatically implied if not setuid)
    --unshare-user-try           Create new user namespace if possible else continue by skipping it
    --unshare-ipc                Create new ipc namespace
    --unshare-pid                Create new pid namespace
    --unshare-net                Create new network namespace
    --unshare-uts                Create new uts namespace
    --unshare-cgroup             Create new cgroup namespace
    --unshare-cgroup-try         Create new cgroup namespace if possible else continue by skipping it
...
    --clearenv                   Unset all environment variables
...

What do we want to unshare here ?

Unsharing the network namespace is a terrible idea, unless you want prevent an application access to any network (including localhost).

Unsharing the PID namespace (processes) seems a clear win, so does clearing the environment.

The IPC namespace is probably, most of the time fine to unshare (important things like fifo, pipes and unix sockets are on the filesystem namespace, except abstract unix sockets which are on the network namespace), but it’s also hard to see the point (the compromised process would have to find an exploitable program running in the non-sandboxed environment whose attack vector would be POSIX message queues or SYSV IPC, which in practice are very rarely used by desktop applications). We will see later that sandboxing graphical applications can come with some complications, and unsharing the IPC namespace might bring up some really tricky bugs to figure out on top of those. We will already have enough on our plate when we will try to sandbox desktop apps, so let’s not unshare this.

I don’t see the point of unsharing UTS namespace (it’s about clearing the hostname), same for cgroup (unless possibly if you want to apply limits to the newly created cgroup later, but I never tried). I don’t see any big deal unsharing them either. Toss a coin to decide.

The whole sharing or unsharing of user namespace thing is a (small but annoying) can of worms that I won’t try to extensively cover here (if ever). To make things short : it’s affected by the way your distribution has installed bubblewrap (suid or not) and will have minimal effects (the biggest one being that unsharing means files belonging to root will belong to nobody in the sandbox). Let’s be satisfied with the default on your system (whichever it is).

So let’s add to --clearenv and --unshare-ipc to our baseline bubblewrap arguments. If you’re feeling extra-paranoid, you can add --unshare-uts, --unshare-user and --unshare-ipc :

$ bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /lib /lib --ro-bind /lib64 /lib64 --ro-bind /sbin /sbin --ro-bind /etc /etc --proc /proc --dev /dev --tmpfs /tmp --clearenv --unshare-pid /usr/bin/zsh
$ ls /
bin  dev  etc  lib  lib64  proc  sbin  tmp  usr
$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
sloonz          1  0.0  0.0   2720  1152 ?        S    15:39   0:00 bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /lib /lib --ro-bind /lib64 /lib64 --ro-bind /sbin /sbin --ro-bind /etc /etc --proc /proc --dev /dev --tmpfs /tmp --clearenv --unshare-pid /usr/bin/zsh
sloonz          3  0.0  0.0   6084  4436 ?        S    15:39   0:00 /usr/bin/zsh
sloonz          6  100  0.0   8024  3988 ?        R+   15:39   0:00 ps aux
$ killall firefox
firefox: no process found
$ env
PWD=/
HOME=/home/sloonz
LOGNAME=sloonz
SHLVL=1
OLDPWD=/
_=/bin/env

It looks good for a stateless application (for example if you want to sandbox curl https://ipinfo.io to get your IP). What if you want to keep files between sessions ? Well, let’s use a temporary home :

$ mkdir ~/sandboxes/my-node-project
$ bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /lib /lib --ro-bind /lib64 /lib64 --ro-bind /sbin /sbin --ro-bind /etc /etc --proc /proc --dev /dev --tmpfs /tmp --clearenv --unshare-pid --bind ~/sandboxes/my-node-project ~ --chdir ~ /usr/bin/zsh
$ npm install whatever

That way, node_modules will be installed in ~ (within your sandbox) or ~/sandboxes/my-node-project (in the non-sandboxed environment). If you happen to install a compromised node library, this won’t compromise your home directory.

You may want to bind some common configuration files, like ~/.zshrc (unless you have a AWS_SECRET_ACCESS_KEY environment variable in it) or ~/.config/nvim. Remember to bind them readonly (--ro-bind instead of --bind) ; otherwise a compromised process may write to them a malicious payload to gain access the next time those files are read (and executed) in your non-sandboxed environment.

Next time, we’ll see the basics of running sandboxed graphical applications like your IDE or your browser.