2026-05-24

RISC-V Linux Kernel Modules in Rust - Part 2: Misc. Device Boogaloo

RISC-V Linux Kernel Modules in Rust - Part 2: Misc. Device Boogaloo thumbnail

In part 2 of my RISC-V + Linux + Rust series, we'll be diving into how to work with Miscellaneous Device drivers, and build a user-space program in C to communicate with our device.

Introduction

In part 1 of this blog post series, we covered a ton of ground: We got our local environment set up to allow us to cross-compile the Linux kernel to RISC-V CPUs, we set up our Rust-for-Linux toolchain, we set up our QEMU environment for running the RISC-V kernel, and we set up a nice workflow for writing, building, and executing our own kernel modules. We also discussed using LLMs as learning tools, and the general motivations behind this blog post series - I highly recommend giving it a read if you haven't already!

This time, we'll be diving a bit deeper into how the Linux kernel works, and try to register a miscellaneous device on our emulated RISC-V Linux kernel. Once we get that wired up, we'll whip up a small user-space program in C, which we'll use to communicate with our kernel device, by defining an ABI (Application Binary Interface)! So we have a ton to do once again - let's get right into it.

Note: This blog post presumes that you have all of our environment configurations, tools, toolchains, and Makefiles from part 1 up and running, and will not be re-explaining any of them. All the code, and some very rough notes on the flow, can be found on my GitHub repository for the project: https://github.com/grammeaway/linux-rust-riscv-project

Table of Contents

Explaining miscdevice

So right out of the gate, let's talk about what miscdevice even is. We'll be getting a bit more technical and low-level than in part 1, so allow yourself the luxury of initially not fully understanding things right out of the gate - it'll get clearer as we press on, and it's a lot of fun!

In Unix-like operating systems, like good ol' Linux, there are two types of "devices": One is called a block device, and one is called a character device. Regardless of the type, a device is represented as a special type of file.

Block devices handle data which comes in fixed-size chunks, like e.g. hard drives, SSDs, and floppy disks.

Character devices are stream-oriented, and handle byte-by-byte data flows, in a sequential manner.

Character devices are the type you as a user will most commonly interact with directly, and also the type we'll be working on today.

Every single device on a Linux system is identified with a number pair: A major and a minor.

You can validate that on your QEMU setup from part 1, by launching it, and running:

~ # ls -l /dev | grep -E "null|zero|dev"
crw-rw-rw-    1 0        0           1,   3 Jan  1  1970 null
crw-rw-rw-    1 0        0           1,   8 Jan  1  1970 random
crw-rw-rw-    1 0        0           1,   9 Jan  1  1970 urandom
crw-rw-rw-    1 0        0           1,   5 Jan  1  1970 zero

If you omit the grep portion, you'll get a ton of output, so I opted to zero in on a few common ones. Now there's a bit to unpack here: You'll notice that the filetype is being denoted with a c. This denotes a character device. The numbers are the previously mentioned major/minor pair that all devices get assigned. So e.g. /dev/null (a special device acting as a "black hole" data sink / guaranteed EOF-provider when read from), has the major/minor combo 1,3.

For the major/minor pairs, the major identifies the associated driver, and the minor identifies the specific device. The kernel maintains an internal table, correlating openings of the major driver, with specific devices. This is used to direct syscalls to their correct receiver.

Now, if we were to allocate a real-deal, all the bells and whistles character device, the steps would roughly look like this:

  1. Allocate a major number, or request a specific major.
  2. Allocate a range of minor numbers under the major.
  3. Create a cdev structure with all file_operations callbacks.
  4. Using cdev_add to register it.
  5. Create a device class, so udev, mdev, or devtmpfs can know about the device.
  6. Calling device_create, to trigger the creation of a /dev/yournewdevice node.

And the reverse of all of these steps for any clean-up flows.

This is a lot of wiring. If you were implementing something more heavy-duty, like e.g. a GPU driver exposing multiple individual devices, with its own sysfs hierarchy, this is the appropriate, and best-practice way of going about it.

But: For the vast majority of devices, this is a significant overkill. For a whole lot of devices, the needs just boil down to needing a single file in /dev, offering a single piece of functionality. And that is the use case for miscdevice.

The kernel offers us a convenient shortcut, through the misc subsystem (short for "miscellaneous"). misc is a pre-existing character device driver, that owns the major number 10, and is fully open for drivers to register themselves as minors under.

To see the current miscdevices running in our emulated system, run another ls with a grep again /dev:

~ # ls -l /dev | grep " 10,"
crw-r--r--    1 0        0          10, 235 Jan  1  1970 autofs
crw-------    1 0        0          10, 257 May 22 11:23 cpu_dma_latency
crw-------    1 0        0          10, 183 Jan  1  1970 hwrng
crw-------    1 0        0          10, 237 Jan  1  1970 loop-control
crw-------    1 0        0          10, 256 Jan  1  1970 vga_arbiter

Just 5 of them registered nodes in our aggressively simple emulated system, but all of them registered under major number 10, and each one a small driver exposing a single /dev/foo node, on which they handle open/read/write/ioctl (input/output control) commands.

When registering a miscdevice, you provide the kernel with a name (which becomes /dev/<name>), an optional specific minor number, and your driver's file_operations (i.e., the callbacks for read/write/open/ioctl). The kernel takes care of things from there, and wires up all the steps described earlier - much easier to handle for us as kernel developers.

This is why misc exists - there are tons of reasons for needing a driver in the kernel, and many of them are quite small in scale. For these use cases, the entire device+driver registration and clean-up flow were just a bit too labor-intensive. So you can think of misc as a very convenient helper function from the kernel, to let you focus on the fun parts.

Some Neat Real-World Examples

As mentioned, there's an abundance of miscdevice examples out there, but our very simple emulated system doesn't ship with all that many of them. But a few fun ones to know from a more fully-fledged Linux system would be:

  • /dev/kvm: The entry-point to KVM (Kernel-based Virtual Machine). Open it and ioctl on it to create Virtual Machines (how cool is that?).
  • /dev/fuse: Userspace filesystem driver entrypoint.
  • /dev/uinput: Userspace input device emulation.

The list goes on, but they all share the property of simply needing the kernel to expose a simple control file, for the userspace to interact with. Typically, the real interface of these are ioctl, with read/write/open playing secondary roles, or no roles at all.

What We'll Be Building

With all of that covered, let's get into what we'll be building. Our goal for today, is to build the miscdevice driver /dev/rvcpu. rvcpu will only support read operations, and will offer access to some of the data that we read from the RISC-V registries in part 1. This isn't exactly the most true-to-real-life use case for misc, but it'll get us through some core concepts, and do so without things getting too hairy (I remind you that I'm actively learning these things as I write the posts, so things are kept simple and slow, mainly for my own sake).

Once we have our miscdevice driver wired up, we'll build a small userspace client for it in C, and define an ABI (Application Binary Interface) for how to interact with the driver.

Sounds good? Then let's get into the meat and potatoes of it all.

Step by Step

Just like in part 1, we'll be tackling this in small steps at a time, evaluating and learning as we go along.

Step 1: The Rust miscdevice sample code

In your local clone of the linux codebase, go have a look at samples/rust/rust_misc_device.rs. As a start, just have a quick glance at the code - it's much more feature-complete than what we'll end up having at the end of this post. This makes it both a good learning tool, and also a bit overwhelming.

One thing worth noticing, is the portion of the code where the author implements the MiscDevice interface on his RustMiscDevice class:

impl MiscDevice for RustMiscDevice {
    type Ptr = Pin<KBox<Self>>;

    fn open(_file: &File, misc: &MiscDeviceRegistration<Self>) -> Result<Pin<KBox<Self>>> {
    }

    fn read_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterDest<'_>) -> Result<usize> {
    }

    fn write_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterSource<'_>) -> Result<usize> {
    }

    fn ioctl(me: Pin<&RustMiscDevice>, _file: &File, cmd: u32, arg: usize) -> Result<isize> {
    }
}

Ignoring the actual function implementations, you'll notice all of the pieces of functionality we talked about earlier: Open, read, write, ioctl. No matter how complicated or simple the driver, these remain the basic building-blocks of a miscdevice.

Step 2: Setting up our first miscdevice module

For our use case and learning purposes, we can settle for a lot less. For a start, create a directory named my-misc-device-module in your project directory from part 1. This will need the same Kbuild and Makefile as the previous modules (note that we're still building this as an out-of-tree kernel module, which registers a miscdevice when loaded.

Now copy the Rust miscdevice sample into your directory structure:

~ # cp /path/to/torvalds/linux/samples/rust/rust_misc_device.rs /path/to/your/project/my-misc-device-module/my_misc_device_module.rs

Now, we'll be drastically stripping down the example - for a start, we'll shave off everything pertaining to write_iter, ioctl, set_value, get_value and hello. So we'll just be leaving the implementations of open and read_iter in the codebase.

This little exercise is pretty dull (sorry), but when combined with our Makefile setup from part 1, it will give you a neat chance to see first-hand some of the strengths of the Rust compiler - as you chop away and re-build, the Rust compiler will keep letting you know where things are broken, and warn you about imports that are now obsolete. Keep going at it, until you have something that compiles - refer to the GitHub repo as needed.

Note: Make sure that you find the MiscDeviceOptions config, and rename the driver to rvcpu!

Step 3: Loading our device

Once you have everything compiling without complaints from the Rust compiler, go ahead and run the top-level Makefile, and launch your QEMU environment.

Just like last time, use insmod to load your new module - note that the name of the module, and the driver are separate entities, defined in separate portions of the codebase:

~ # insmod lib/modules/my_misc_device_module.ko

This should give you zero output, unless you made a pr_info! macro call in your module init code.

To validate the existence of the device, go ahead and run an ls command against /sys/class/misc/:

~ # ls -l /sys/class/misc/
total 0
lrwxrwxrwx    1 0        0               0 May 22 12:35 autofs -> ../../devices/virtual/misc/autofs
lrwxrwxrwx    1 0        0               0 May 22 12:35 cpu_dma_latency -> ../../devices/virtual/misc/cpu_dma_latency
lrwxrwxrwx    1 0        0               0 May 22 12:35 hw_random -> ../../devices/virtual/misc/hw_random
lrwxrwxrwx    1 0        0               0 May 22 12:35 loop-control -> ../../devices/virtual/misc/loop-control
lrwxrwxrwx    1 0        0               0 May 22 12:35 rvcpu -> ../../devices/virtual/misc/rvcpu
lrwxrwxrwx    1 0        0               0 May 22 12:35 vga_arbiter -> ../../devices/virtual/misc/vga_arbiter

And there you should see our rvcpu driver, loaded and ready to go.

However: If you were to look for it through running ls against /dev/, the reference point we used previously to look for devices, it wouldn't be there. Why is that?

Step 4: Registering the device fully

Well, there's layers to why it isn't showing up. When our module calls the MiscDeviceRegistration::register function, the kernel does what it's supposed to do: Allocates a minor number, adds an entry to its miscdevice table, makes /sys/class/misc/rvcpu/ appear, and is ready to handle opens on the major/minor pairing.

What it does however not do, is create the /dev/rvcpu file. This is because that file is a pure userspace concern - the /dev/rvcpu file is just a filesystem object. It's a special kind, as covered earlier, but still a construct in the userspace filesystem. So we need something in the userspace taking care of that half of the setup.

We'll go through the entire registration flow once, but to address that, we need to slightly expand our emulated system's capabilities, by expanding the amount of supported POSIX operations we have. So it's time to add some more symlinks to busybox.

Step 4.1: Supporting mknod in our emulated system

In your project directory, cd into your rootfs dir, then into the bin directory, and create a new symlink towards busybox for the mknod command:

~ # cd rootfs/bin
~ # ln -s busybox mknod

And just like that, we have the tool we need. Rebuild your initramfs, and restart your emulator. Remember to once again load the module with insmod, and validate that it still correctly shows up in /sys/class/misc/.

Step 4.2: Registering the device node

To use mknod, we need to supply it 4 inputs: The name of the device node we'd like to create (i.e., /dev/rvcpu), the type of device we're registering (i.e., a character device, denoted with a c), the major number (i.e., 10, because we're registering a miscdevice), and the minor driver number. We know the answer to all of these except the minor number.

Luckily, as established earlier, the kernel has already sorted this out for us - so we just need to go looking for it.

To find the assigned minor number of our driver, simply read from the rvcpu/dev entry in /sys/class/misc/:

~ # cat /sys/class/misc/rvcpu/dev
10:258

The minor number might vary on your machine, and that's all good - the kernel has just made sure that it was an available minor number. On my setup, the minor ended up as 258, which is what I'll be using in the example to come.

Now that we've been armed with the missing parameter for mknod, we can register our driver like so:

~ # mknod /dev/rvcpu c 10 258

And if that exits without complaining, you should be able to attempt an open and a read from the now-created /dev/rvcpu:

~ # cat /dev/rvcpu
[ 7657.164932] misc rvcpu: Opening Rust Misc Device Sample
[ 7657.166669] misc rvcpu: Reading from Rust Misc Device Sample
[ 7657.171795] misc rvcpu: Exiting the Rust Misc Device Sample

And that's the end-to-end registration flow, with our cat command triggering both the open, read_iter, and PinnedDrop implementations. No failures on the cat commands tells us that the empty buffer read returned cleanly with a 0 exit code. So far, so great!

Step 5: Making step 4 obsolete

For learning purposes, going through the mknod flow is all well and good. But now that we've been through it, let's make sure that there's something in our userspace that can take over this part of the flow, and finish the work that our kernel starts when the module is registered.

To handle the userspace side of miscdevice registration, we need to mount devtmpfs on boot. devtmpfs is a virtual file system, which handles the automatic population of device nodes. So by mounting it on /dev, it'll take care of all the device registrations in user space for us - pretty neat!

To mount it on boot, we need to modify our init script, which we wrote back in part 1. At the current point in time, it should look something like this:

#!/bin/sh
mount -t proc proc /proc
mount -t sysfs sysfs /sys
exec /bin/sh

At any point between the shebang and the launch of the interactive shell, add the command:

mount -t devtmpfs devtmpfs /dev

This will mount devtmpfs to our /dev directory, and outsource all the userspace-specific handling of registering to this handy, virtual file system.

Use your Makefile to rebuild your initramfs, and once again load your module with insmod. Now, running ls -l against /dev should greet you with a fully registered device node:

~ # ls -l /dev/ | grep " 10,"
crw-r--r--    1 0        0          10, 235 Jan  1  1970 autofs
crw-------    1 0        0          10, 257 May 22 11:23 cpu_dma_latency
crw-------    1 0        0          10, 183 Jan  1  1970 hwrng
crw-------    1 0        0          10, 237 Jan  1  1970 loop-control
crw-------    1 0        0          10, 258 May 22 11:43 rvcpu
crw-------    1 0        0          10, 256 Jan  1  1970 vga_arbiter

And there it is, given the minor number 258! Great, now we can move on to having our node actually contain some data.

Step 6: Writing data to the node

Time for us to once again dive into our Rust code, and read some RISC-V-specific registries.

Step 6.1: Bringing back our RISC-V CSR macro

For a start, let's steal some of our own code. Head into the code for my_csr_module from part 1, and fetch out the read_csr! macro we defined in part 1. Add this somewhere at the top of your new miscdevice module, and remember to import the asm! macro as well:

use core::arch::asm; // For the `asm!` macro.

macro_rules! read_csr {
    ($csr:ident) => {{
        let value: u64;
        // SAFETY: reading a CSR is a pure read with no side effects
        unsafe {
            asm!(
                concat!("csrr {0}, ", stringify!($csr)),
                out(reg) value
            );
        }
        value
    }};
}

Alright, we're now once again armed with the ability to read RISC-V CSRs from our Rust kernel module. For this little project, we'll read and subsequently write the same 3 CSRs as in part 1: instret, cycle, and time. Let's wrap those in a data object, and make a simple function for reading out all 3:

#[repr(C)]
#[derive(Debug, Copy, Clone)]
struct RvcpuSnapshot {
    time: u64,
    cycle: u64,
    instret: u64,
}

// SAFETY: `#[repr(C)]` struct of three `u64`s, no padding bytes, no interior mutability.
unsafe impl AsBytes for RvcpuSnapshot {}

const RVCPU_IOC_SNAPSHOT: u32 = _IOR::<RvcpuSnapshot>('|' as u32, 0x80);

#[cfg(target_arch = "riscv64")]
fn take_snapshot() -> RvcpuSnapshot {
    RvcpuSnapshot {
        time: read_csr!(time),
        cycle: read_csr!(cycle),
        instret: read_csr!(instret),
    }
}

And for your imports, make sure you're including:

use kernel::{
    transmute::AsBytes,
    uaccess::UserSlice,
    ioctl::_IOR,
}

This little chunk of code defines a struct for holding our CSR readings, and a function for producing an instance of the struct, by leveraging our read_csr! macro.

It's worth noting the architecture gate #[cfg(target_arch = "riscv64")] over the take_snapshot() function - this ensures that the Rust compiler will only compile the function when the target build architecture is RISC-V. As we discussed in part 1, it would also be viable to add the gate to the macro definition, but since this is the only context in the module where the macro is called, adding the gate here implicitly also excludes the macro from compilation (since macros are "expanded" during compilation). That being said, you could opt for a belt-and-suspenders approach, and architecture gate every single piece of code that should only be included for RISC-V contexts.

It's also worth noting the #[repr(C)] annotation on the RvcpuSnapshot struct. This tells the Rust compiler to store instances of this struct in memory, in a way that's compatible with the C programming language - this'll come in handy later, when we make our userspace client for interacting with our driver!

For now, just take note of but largely ignore the RVCPU_IOC_SNAPSHOT const, and the AsBytes implementation for the struct, as they'll become more relevant later on. Notice how we just like in part 1, make sure to comment any uses of the unsafe keyword with a SAFETY comment, to let it be known that we thought through the use of this "unsafe" bit of code.

Step 6.2: Understanding the structure of our device specification

Alright, we have all the pieces in place to read data from our RISC-V registries, now we "just" need to write them. When going through this step, I found myself needing to iterate a ton here, as getting to write to the KVVec buffer just wasn't playing ball with me. I'll be referencing only the final solution in this post, but just to let you know that some level of struggling is to be expected here, especially if you're pretty new to more low-level programming, like I both was and am.

Another way I'm slightly "cheating" as we're writing this here, is that I'm already now setting the code up for our eventual userspace client program. So there's a few refactorings that I'm cutting out of this post, largely because I've pretty much lost the overview on the journey the module codebase went through during that session of debugging and refactors.

With those disclaimers out of the way, let's get our writing setup done.

To summarize, what we're trying to achieve is to write the contents of our RvcpuSnapshot to the internal buffer of our miscdevice driver. Once that data is written, our open, read_iter, and ioctl implementations will be able to perform operations on the data - we'll keep things simple for now, and simply support reading and updating the snapshot.

Let's have a quick look at how our device construct actually exists in our codebase, and how it registers with the kernel:

#[pin_data]
struct RustMiscDeviceModule {
    #[pin]
    _miscdev: MiscDeviceRegistration<RustMiscDevice>,
}

impl kernel::InPlaceModule for RustMiscDeviceModule {
    fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
        pr_info!("Initialising Rust Misc Device Sample\n");

        let options = MiscDeviceOptions {
            name: c"rvcpu",
        };

        try_pin_init!(Self {
            _miscdev <- MiscDeviceRegistration::register(options),
        })
    }
}

struct Inner {
    buffer: KVVec<u8>,
}

#[pin_data(PinnedDrop)]
struct RustMiscDevice {
    #[pin]
    inner: Mutex<Inner>,
    dev: ARef<Device>,
}

Worth noticing right out of the gate, is that we're leveraging a lot of convenient, built-in Rust-for-Linux libraries for handling the registration of our miscdevice. This simplifies our registration a fair bit, especially for a simple device like ours - we simply supply a name for the device in our options, and we're ready to register.

The main thing we're about to wrestle with, is our Inner struct; a KVVec of unsigned 8-bit integers. In Rust, KVVec is a type alias for the kernel's own Vec<T, KVmalloc>, specifically made for Linux kernel development - an analog std's Vec, adapted for kernel use with a specific allocator. It uses the (KVmalloc) allocator, which uses the kvmalloc allocation strategy, which first attempts the faster, physically-contiguous kmalloc, and falls back to virtually-contiguous vmalloc for larger allocations that can't be satisfied contiguously.

This Inner buffer is where we want to write our data. But we don't just want it to hold plain ol' text data. Since we plan on introducing a client program for the driver written in C, we want the data in the buffer to be written in a format understandable by both of the programs, across their differing programming languages - this time, we're opting for binary!

Note: As we go on from here, you'll notice us leveraging various pin annotations and macros. Understanding the full depth of what these tools provide us with is a bit outside of the scope of this project, but know that they serve an important role in kernel development, as it controls the way that these blocks of code or structs are placed in memory - they get structurally pinned. This is not super important for the learning outcome we're pursuing in this post, but it is very important for kernel development in general - I can only encourage you to dive deeper into it, if you're curious.

To achieve the functionality we want from our driver, we'll be implementing the open, read_iter, and ioctl functionalities, as defined in the RustMiscDevice interface. We'll be taking them step by step.

Step 6.3: Implementing open

Our open function will be called any time the userspace calls open("/dev/rvcpu"). This will cause the misc subsystem to dispatch the open syscall to our driver.

In your impl MiscDevice for RustMiscDevice block, we'll first be implementing the open function:

    fn open(_file: &File, misc: &MiscDeviceRegistration<Self>) -> Result<Pin<KBox<Self>>> {

        let dev = ARef::from(misc.device());
        dev_info!(dev, "Opening Rust Misc Device Sample\n");


        let mut buffer = KVVec::new();

        #[cfg(target_arch = "riscv64")]
        {
            let snap = take_snapshot();
            buffer.extend_from_slice(snap.as_bytes(), GFP_KERNEL)?;
        }

        KBox::try_pin_init(
            try_pin_init! {
                RustMiscDevice {
                    inner <- new_mutex!(Inner {
                        buffer: buffer,
                    }),
                    dev: dev,
                }
            },
            GFP_KERNEL,
        )
    }

In this function we have 4 main points of interest, happening in order as the function executes:

  1. Our take_snapshot() function is called, which executes our CSR-reading macro 3 times, which uses in-line assembly to read from our RISC-V machine's registries. It returns these readings as an instance of the RvcpuSnapshot struct.
  2. Remember when we added the unsafe impl AsBytes for RvcpuSnapshot {} line of code? That was all leading up to this moment: We can now call as_bytes() on our snapshot, which will take our struct of three u64 values, and return a &[u8] view of the raw memory - taking up 24 bytes.
  3. We then push these bytes into our allocated KVVec buffer, using the built-in extend_from_slice() function.
  4. Finally, it all gets tied together. We'll again be slightly glossing over the nitty-gritty details, but: KBox::try_pin_init allocates a kernel heap box big enough for our RustMiscDevice instance (which we have initiated with our extended buffer as the inner field. It then constructs it in place inside that allocation. The Mutex and the KVVec end up living inside one heap allocation, pinned to that address. The kernel returns this pinned box back through the misc subsystem. It'll be associated with the open file descriptor for the lifetime of the open.

Notice that we're once again architecture-gating our RISC-V-specific code. So in the event that this module was to be built for another target architecture, our driver would simply be empty, since the entire snapshot loading section would be skipped.

At the end of this function, after a successful open() syscall from the userspace, the kernel now holds a per-open RustMiscDevice instance. Future read() and ioctl() syscalls on that device will be dispatched against this specific instance.

Step 6.4: Implementing read_iter

Our read_iter implementation, will be called any time userspace dispatches a read() syscall. Our implementation looks like so:

    fn read_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterDest<'_>) -> Result<usize> {
        let me = kiocb.file();
        dev_info!(me.dev, "Reading from Rust Misc Device Sample\n");

        let inner = me.inner.lock();
        // Read the buffer contents, taking the file position into account.
        let read = iov.simple_read_from_buffer(kiocb.ki_pos_mut(), &inner.buffer)?;

        Ok(read)
    }

Bit less going on here, but 3 interesting points none the less:

  1. The let me = kiocb.file(); line returns a reference to the per-open RustMiscDevice instance we created in our open() implementation. The kernel keeps track of which instance to return to us.
  2. Before reading from our inner field, we must call lock() on it. This is the idiomatic Rust-for-Linux pattern for any shared mutable state. Even though our specific module never mutates the buffer after open(), the mutex guards against any concurrent access, and ensures that the code is ready for possible future extension.
  3. Finally, we read the content of the buffer. simple_read_from_buffer takes the file position (kiocb.ki_pos) and the source buffer, and copies as many bytes as fit into the userspace destination (the IovIterDest parameter). It then advances the file position, and returns how many bytes were copied. We finish up the function by returning the read content.

There's some very elegant, clever functionalities going on here, and we unfortunately can't take credit for them - they all pertain to the tracking of the file position:

The first read syscall has ki_pos = 0, so it copies from offset 0 to the end of the buffer (24 bytes, due to our 3 x 8 bytes), advances ki_pos to 24, and returns 24. The second read syscall has ki_pos = 24, which is past the end of our 24-byte buffer, so it copies 0 bytes and returns 0. cat interprets the 0 return as EOF, and exits.

That's how cat /dev/rvcpu works correctly: Two read calls, with the helper handling the position bookkeeping for us.

Step 6.5: Implementing ioctl

This is where the real magic happens. In most of the miscdevice drivers out there, the heavy lifting and complex functionality will take place in ioctl, due to it being easily extensible through the handling of ioctl commands.

We'll once again be looking back at some of our early code in this module, where we defined the constant const RVCPU_IOC_SNAPSHOT: u32 = _IOR::<RvcpuSnapshot>('|' as u32, 0x80); - this defines an ioctl number, which we then map to a command in our ioctl implementation. Userspace clients that wish to interact with our driver, will need to know this ioctl number to interact with this piece of functionality. We'll dive into how to ensure that a bit later.

Let's implement handling for our RVCPU_IOC_SNAPSHOT command:

    fn ioctl(me: Pin<&RustMiscDevice>, _file: &File, cmd: u32, arg: usize) -> Result<isize> {
        dev_info!(me.dev, "IOCTL on Rust Misc Device Sample (cmd: {})\n", cmd);

        match cmd {
            RVCPU_IOC_SNAPSHOT => {
                #[cfg(target_arch = "riscv64")]
                {
                    let snap = take_snapshot();
                    let user_arg = UserPtr::from_addr(arg);
                    let size = core::mem::size_of::<RvcpuSnapshot>();
                    UserSlice::new(user_arg, size)
                        .writer()
                        .write::<RvcpuSnapshot>(&snap)?;
                }
                Ok(0)
            }
            _ => {
                dev_err!(me.dev, "Unrecognised IOCTL command: {}\n", cmd);
                Err(ENOTTY)
            }
        }
    }

In this function, we set up a pattern matching block on the cmd parameter, which holds an ioctl number. We're just looking to handle our one piece of functionality (to update the snapshot without dispatching another open syscall), so that's the only pattern we match against - everything else returns a ENOTTY ("not a typewriter" - historical error convention for when a device doesn't recognize the ioctl command given to it).

Our function goes through the following steps as it executes:

  1. Executes the pattern-matching against the cmd parameter.
  2. In our architecture-gate, we create a new snapshot.
  3. Now comes the tricky part: We want to return this snapshot to userspace. We can't interact directly with arg, since it's a userspace construct, and we're working at the kernel-level - userspace memory is not to be trusted, as it is both very fluid, and can be tampered with for malicious reasons. Luckily, the kernel contains special functions for handling these scenarios in a safe manner. To use these, first we create a new UserPtr from the arg parameter, to get a pointer to userspace. We then use one of the Rust wrappers for the special kernel functions, UserSlice::new(user_arg, size).writer().write::<RvcpuSnapshot>(&snap), to do the following: a. Wrap the userspace pointer in a UserSlice of the declared size. b. Ask for a writer, i.e., a one-shot sink that writes to our allocated userspace memory. c. Call write::<T>, where our snapshot falls under the T: AsBytes bound, due to us implementing AsBytes previously. write knows how to work with our snapshot as bytes, and how to move it across the boundary from the kernel, and into userspace. d. If the userspace pointer is invalid, the helper returns an error rather than crashing the kernel. The ? propagates that error to userspace, as the return value of the ioctl call.

And that's it! That's our entire implementation of our miscdevice. Now let's take it for a spin.

Step 7: Testing out the driver

By now, you should have the following implementations for your driver:

#[vtable]
impl MiscDevice for RustMiscDevice {
    type Ptr = Pin<KBox<Self>>;

    fn open(_file: &File, misc: &MiscDeviceRegistration<Self>) -> Result<Pin<KBox<Self>>> {

        let dev = ARef::from(misc.device());
        dev_info!(dev, "Opening Rust Misc Device Sample\n");


        let mut buffer = KVVec::new();

        #[cfg(target_arch = "riscv64")]
        {
            let snap = take_snapshot();
            buffer.extend_from_slice(snap.as_bytes(), GFP_KERNEL)?;
        }

        KBox::try_pin_init(
            try_pin_init! {
                RustMiscDevice {
                    inner <- new_mutex!(Inner {
                        buffer: buffer,
                    }),
                    dev: dev,
                }
            },
            GFP_KERNEL,
        )
    }

    fn read_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterDest<'_>) -> Result<usize> {
        let me = kiocb.file();
        dev_info!(me.dev, "Reading from Rust Misc Device Sample\n");

        let inner = me.inner.lock();
        // Read the buffer contents, taking the file position into account.
        let read = iov.simple_read_from_buffer(kiocb.ki_pos_mut(), &inner.buffer)?;

        Ok(read)
    }

    fn ioctl(me: Pin<&RustMiscDevice>, _file: &File, cmd: u32, arg: usize) -> Result<isize> {
        dev_info!(me.dev, "IOCTL on Rust Misc Device Sample (cmd: {})\n", cmd);

        match cmd {
            RVCPU_IOC_SNAPSHOT => {
                #[cfg(target_arch = "riscv64")]
                {
                    let snap = take_snapshot();
                    let user_arg = UserPtr::from_addr(arg);
                    let size = core::mem::size_of::<RvcpuSnapshot>();
                    UserSlice::new(user_arg, size)
                        .writer()
                        .write::<RvcpuSnapshot>(&snap)?;
                }
                Ok(0)
            }
            _ => {
                dev_err!(me.dev, "Unrecognised IOCTL command: {}\n", cmd);
                Err(ENOTTY)
            }
        }
    }
}

#[pinned_drop]
impl PinnedDrop for RustMiscDevice {
    fn drop(self: Pin<&mut Self>) {
        dev_info!(self.dev, "Exiting the Rust Misc Device Sample\n");
    }
}

impl RustMiscDevice {
}

If that is all compiling without complaints, we'll need to quickly add another symlink to our initramfs. Currently, we have no tool that'll let us actually read the binary output in human-readable format. We'll call on hexdump to scratch that itch for us.

In your rootfs dir, run:

~ # cd bin
~ # ln -s busybox hexdump

Repackage your initramfs and rebuild your module through your Makefile, and let's try messing around with our driver:

~ # insmod lib/modules/my_misc_device_module.ko
[    8.298482] my_misc_device_module: loading out-of-tree module taints kernel.
[    8.334272] rust_misc_device: Initialising Rust Misc Device Sample
~ # cat /dev/rvcpu | hexdump -C
[   18.676697] misc rvcpu: Opening Rust Misc Device Sample
[   18.683358] misc rvcpu: Reading from Rust Misc Device Sample
[   18.687349] misc rvcpu: Reading from Rust Misc Device Sample
[   18.688903] misc rvcpu: Exiting the Rust Misc Device Sample
00000000  5b 44 78 0b 00 00 00 00  9a 8c f1 39 ae 99 06 00  |[Dx........9....|
00000010  c2 3d f3 39 ae 99 06 00                           |.=.9....|
00000018

Hopefully, you're seeing at least somewhat similar output from your emulator, because this is exactly what we've been chasing! We have the console outputs from our driver, and more importantly, we have a neat little lineup of hexadecimal values (hexdump making the binary reading a bit easier on us).

Now that we have everything working on the driver side of things, let's introduce a client program to be executed in userspace.

Step 8: Defining our ABI

For our userspace application to talk to our kernel-level driver, they need a shared understanding of which actions the driver supports, and what the expected return value from those actions look like. Most developers, and a whole heap of non-developers as well, will be familiar with the term API: Application Programming Interface. An interface exposing parts of a system's functionality, along with a "contract" specifying how to communicate with a system. An Application Binary Interface (ABI) is the same general idea, but since things are happening at a very low-level point on shared hardware, the communication format ends up being binary - so a pretty familiar concept.

To get started on our client programme, we'll first make a new directory in our project directory:

~ # mkdir rvcpu_client

In there, we'll be making a header file: rvcpu_uapi.h. This file will contain definitions for our ioctl commands, and our RvcpuSnapshot struct. Add the following C code to the file:

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _RVCPU_UAPI_H
#define _RVCPU_UAPI_H

#include <linux/ioctl.h>
#include <linux/types.h>

struct rvcpu_snapshot {
    __u64 time;
    __u64 cycle;
    __u64 instret;
};

#define RVCPU_IOC_MAGIC '|'
#define RVCPU_IOC_SNAPSHOT _IOR(RVCPU_IOC_MAGIC, 0x80, struct rvcpu_snapshot)

#endif /* _RVCPU_UAPI_H */

A few conventions worth knowing for kernel development, that are visible in this file:

  • __u64 with two underscores, is the kernel-userspace-portable type from <linux/types.h>. It is always 64 bits, leaving no room for surprises across different compilers.
  • The _IOR macro from <linux/ioctl.h> is the C twin of our Rust _IOR<T> import, which we used to define our ioctl command. Same bit layout, and same direction encoding.
  • The RVCPU_IOC_MAGIC definition, defines the "magic number" of our ioctl number (i.e., our command). It's a mechanism used by the kernel community, to partition the namespace of ioctl numbers, to avoid collisions between ioctl numbers - if this were a proper upstream driver, we would have to go through the kernel docs, and find an unused "magic byte". Coincidentally, the magic we configured here, is partially claimed by linux/media.h for nr values 0x00-0x7F. Luckily, our nr = 0x80 falls outside that range, so we're not actually colliding. Had this however been upstream-bound, the convention would be to pick a different magic byte entirely to avoid even the appearance of conflict.

Alright, that's our ABI - now on the client using it!

Step 9: Our userspace client

For the userspace client, we'll once again be turning to C as our language of choice. The client we're writing will simply be exercising the open, read, and ioctl syscalls against the driver, so nothing too crazy going on. For really exercising our ioctl command, we'll be requesting multiple fresh snapshots on the same fd (reference to the device after opening it), in a loop with a short sleep in between.

In the same directory as our ABI specification, go ahead and create a file named rvcpu_client.c. The client code will end up looking something like this:

/* SPDX-License-Identifier: GPL-2.0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>

#include "rvcpu_uapi.h"

static void print_snapshot(const char *label, const struct rvcpu_snapshot *s)
{
    printf("%s: time=%llu cycle=%llu instret=%llu\n",
        label,
        (unsigned long long)s->time,
        (unsigned long long)s->cycle,
        (unsigned long long)s->instret);
}

int main(void)
{
    int fd = open("/dev/rvcpu", O_RDONLY);
    if (fd < 0) {
        perror("open /dev/rvcpu");
        return 1;
    }

    /* Path 1: read() returns the snapshot taken at open() time. */
    struct rvcpu_snapshot snap;
    ssize_t n = read(fd, &snap, sizeof(snap));
    if (n != (ssize_t)sizeof(snap)) {
        fprintf(stderr, "short read: got %zd, expected %zu\n", n, sizeof(snap));
        close(fd);
        return 1;
    }
    print_snapshot("open-time read()", &snap);

    /* Path 2: ioctl() takes a fresh snapshot each call. */
    for (int i = 0; i < 5; i++) {
        struct rvcpu_snapshot fresh;
        if (ioctl(fd, RVCPU_IOC_SNAPSHOT, &fresh) < 0) {
            perror("ioctl RVCPU_IOC_SNAPSHOT");
            close(fd);
            return 1;
        }
        char label[32];
        snprintf(label, sizeof(label), "ioctl #%d", i);
        print_snapshot(label, &fresh);

        /* tiny busy delay so successive snapshots differ */
        for (volatile int j = 0; j < 1000000; j++);
    }

    close(fd);
    return 0;
}

One of the longer code-snippets we've introduced so far, but there's luckily just a few points of main interest in it. I won't be going deep into the C semantics of it all, both due to it being outside the scope of what we're doing here, and due to my C knowledge being incredibly surface-level at absolute best. But still, let's dissect the code, just for a bit:

  • #include "rvcpu_uapi.h" - where we import our ABI, and the types defined in it.
  • int fd = open("/dev/rvcpu", O_RDONLY); - the open syscall being exercised.
  •   struct rvcpu_snapshot snap;
      ssize_t n = read(fd, &snap, sizeof(snap));
    
    The read syscall being dispatched, and the return value being bound to our rvcpu_snapshot definition from the .h file.
  •   struct rvcpu_snapshot fresh;
      if (ioctl(fd, RVCPU_IOC_SNAPSHOT, &fresh) < 0) {
          perror("ioctl RVCPU_IOC_SNAPSHOT");
          close(fd);
          return 1;
      }
    
    All of this takes place in our loop, and is where we exercise the ioctl syscall. The ioctl function takes the driver reference, an ioctl number, and a pointer to where to put the output - that's where both programs agreeing upon what's being sent back and forwards gets really important. Just like before, we're assigning to our agreed upon rvcpu_snapshot struct, and getting our RVCPU_IOC_SNAPSHOT ioctl number from our .h file.
  • Finally, we call close() on the fd. Once the last reference to the open file is gone, the PinnedDrop implementation in our Rust code is triggered, freeing up any allocated memory.

A small delta worth noting, is that this is why we tagged our Rust struct with the #[repr(C)] annotation. Like we briefly covered at the time, this ensures that the Rust compiler will store instances of this struct in a C-friendly format, which is part of facilitating this cross-boundary, cross-language communication.

And that's all there is to it! Let's put it all together, and start getting this session wrapped up.

Step 10: Putting it all together

In part 1, we installed quite a few tools - one of them was a C cross-compiler for RISC-V, named riscv64-linux-gnu-gcc. In case you don't have it installed, it's time to do so, as we'll need it to compile our C client.

In your rvcpu_client directory, go ahead and run:

riscv64-linux-gnu-gcc -static -O2 -Wall -Wextra -o rvcpu_client rvcpu_client.c
file rvcpu_client

If everything went well, your file command should return something looking roughly like:

rvcpu_client: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), statically linked, BuildID[sha1]=95654ddbfa198ec3088b244458adc5498e36decb, for GNU/Linux 4.15.0, with debug_info, not stripped

If that's the case, cross-compilation worked, and you have yourself a RISC-V executable.

Copy that executable into your rootfs's bin directory, rebuild your initramfs, and launch your emulator. You'll want to once again load the kernel module using insmod, and then run the client programme:

~ # insmod lib/modules/my_misc_device_module.ko
[   50.592436] my_misc_device_module: loading out-of-tree module taints kernel.
[   50.630363] rust_misc_device: Initialising Rust Misc Device Sample
~ # rvcpu_client
[   55.529992] misc rvcpu: Opening Rust Misc Device Sample
[   55.532903] misc rvcpu: Reading from Rust Misc Device Sample
open-time read(): time=560769893 cycle=2110735679848996 instret=2110735679852672
[   55.538702] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #0: time=560879585 cycle=2110735679848996 instret=2110735679852672
[   55.559247] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #1: time=561062739 cycle=2110735679848996 instret=2110735679852672
[   55.575940] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #2: time=561228227 cycle=2110735679848996 instret=2110735679852672
[   55.590878] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #3: time=561377821 cycle=2110735679848996 instret=2110735679852672
[   55.605857] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #4: time=561527599 cycle=2110735679848996 instret=2110735679852672
[   55.620967] misc rvcpu: Exiting the Rust Misc Device Sample

And that right there, is what we were after 10 steps ago!

The output tells the full story: Our client opens the driver, reads from it, and then gets 5 fresh readings in a loop. You can even see the small increments in the time CSR value as we go through the loops.

Note: The cycle and instret values are frozen in time - this is a side-effect from our emulator setup. On real hardware, these would be changing as well.

And that about does it - it's not much as we're looking at it in the terminal, but building it from scratch gives a completely different perspective of how much is actually going on.

How Everything Pieces Together

I thought it might be beneficial to quickly circle back, and look at both what we built, and how it all connects.

The full line-up of what we made:

  • A miscdevice driver, written in Rust for the Linux kernel.
  • A header file for userspace clients to know how to interact with our kernel-level driver.
  • A client programme that consumes that ABI, and uses it to interact with our driver through the open, read, and ioctl syscalls.

Pretty productive, and a bunch of interesting steps forward from our part 1 kernel module.

Now, just to close the loop on everything, this is how everything pieces together:

  1. On startup, we load our kernel module.
  2. The kernel module registers a driver under miscdevice.
  3. The kernel handles registering the driver in its internal miscdevice index. The driver now shows up in /sys/class/misc/.
  4. The virtual filesystem we mounted in our init script, devtmpfs, handles creating the device node. The driver now shows up in /dev/.
  5. We execute our client application. It uses the provided header file to know how to communicate with the driver, and how to handle the data that comes back from the driver. The #[repr(C)] annotation on our Rust struct and the unsafe impl AsBytes together let the kernel safely view our snapshot as raw bytes, while ensuring the bytes match the C struct layout the userspace client expects.
  6. The client application dispatches the syscalls we implemented, making our driver read from the RISC-V CSRs through in-line assembly code, returning it back to userspace in a safe manner.
  7. The driver is closed at the exit of the application.

Incredible stuff once it's all written down like this! We've touched a lot of moving parts in terms of how the kernel works, and how data moves between the kernel and userspace.

Thanks for reading along this far if you did - I had a lot of headaches and a lot of fun making this, and I hope it was of some sort of value to you as a reader.

Concluding thoughts

This was a slightly heavy, but also very natural-feeling next step in my little learning journey regarding RISC-V, Rust, and the Linux kernel. I hope that anyone following along with this series might feel the same way.

Going forward, I think I will be pursuing some slight "spin-offs" (spiritual successors, if you will), rather than immediately going into part 3. I can almost guarantee that part 3 will pertain to working with the device tree, so stay tuned for that when the time comes. If you're keen on some of the upcoming still-RISC-V-flavoured guides, then that's even better - I'm itching to get started on them.

As always, thank you for your time!