2026-05-24
RISC-V Linux Kernel Modules in Rust - Part 2: Misc. Device Boogaloo
In part 2 of my RISC-V + Linux + Rust series, we'll be diving into how to work with Miscellaneous Device drivers, and build a user-space program in C to communicate with our device.
Introduction
In part 1 of this blog post series, we covered a ton of ground: We got our local environment set up to allow us to cross-compile the Linux kernel to RISC-V CPUs, we set up our Rust-for-Linux toolchain, we set up our QEMU environment for running the RISC-V kernel, and we set up a nice workflow for writing, building, and executing our own kernel modules. We also discussed using LLMs as learning tools, and the general motivations behind this blog post series - I highly recommend giving it a read if you haven't already!
This time, we'll be diving a bit deeper into how the Linux kernel works, and try to register a miscellaneous device on our emulated RISC-V Linux kernel. Once we get that wired up, we'll whip up a small user-space program in C, which we'll use to communicate with our kernel device, by defining an ABI (Application Binary Interface)! So we have a ton to do once again - let's get right into it.
Note: This blog post presumes that you have all of our environment configurations, tools, toolchains, and Makefiles from part 1 up and running, and will not be re-explaining any of them. All the code, and some very rough notes on the flow, can be found on my GitHub repository for the project: https://github.com/grammeaway/linux-rust-riscv-project
Table of Contents
- Explaining
miscdevice - Some Neat Real-World Examples
- What We'll Be Building
- Step by Step
- Step 1: The Rust
miscdevicesample code - Step 2: Setting up our first
miscdevicemodule - Step 3: Loading our device
- Step 4: Registering the device fully
- Step 5: Making step 4 obsolete
- Step 6: Writing data to the node
- Step 7: Testing out the driver
- Step 8: Defining our ABI
- Step 9: Our userspace client
- Step 10: Putting it all together
- Step 1: The Rust
- How Everything Pieces Together
- Concluding thoughts
Explaining miscdevice
So right out of the gate, let's talk about what miscdevice even is. We'll be getting a bit more technical and low-level than in part 1, so allow yourself the luxury of initially not fully understanding things right out of the gate - it'll get clearer as we press on, and it's a lot of fun!
In Unix-like operating systems, like good ol' Linux, there are two types of "devices": One is called a block device, and one is called a character device. Regardless of the type, a device is represented as a special type of file.
Block devices handle data which comes in fixed-size chunks, like e.g. hard drives, SSDs, and floppy disks.
Character devices are stream-oriented, and handle byte-by-byte data flows, in a sequential manner.
Character devices are the type you as a user will most commonly interact with directly, and also the type we'll be working on today.
Every single device on a Linux system is identified with a number pair: A major and a minor.
You can validate that on your QEMU setup from part 1, by launching it, and running:
~ # ls -l /dev | grep -E "null|zero|dev"
crw-rw-rw- 1 0 0 1, 3 Jan 1 1970 null
crw-rw-rw- 1 0 0 1, 8 Jan 1 1970 random
crw-rw-rw- 1 0 0 1, 9 Jan 1 1970 urandom
crw-rw-rw- 1 0 0 1, 5 Jan 1 1970 zero
If you omit the grep portion, you'll get a ton of output, so I opted to zero in on a few common ones. Now there's a bit to unpack here: You'll notice that the filetype is being denoted with a c. This denotes a character device. The numbers are the previously mentioned major/minor pair that all devices get assigned. So e.g. /dev/null (a special device acting as a "black hole" data sink / guaranteed EOF-provider when read from), has the major/minor combo 1,3.
For the major/minor pairs, the major identifies the associated driver, and the minor identifies the specific device. The kernel maintains an internal table, correlating openings of the major driver, with specific devices. This is used to direct syscalls to their correct receiver.
Now, if we were to allocate a real-deal, all the bells and whistles character device, the steps would roughly look like this:
- Allocate a major number, or request a specific major.
- Allocate a range of minor numbers under the major.
- Create a
cdevstructure with allfile_operationscallbacks. - Using
cdev_addto register it. - Create a device class, so
udev,mdev, ordevtmpfscan know about the device. - Calling
device_create, to trigger the creation of a/dev/yournewdevicenode.
And the reverse of all of these steps for any clean-up flows.
This is a lot of wiring. If you were implementing something more heavy-duty, like e.g. a GPU driver exposing multiple individual devices, with its own sysfs hierarchy, this is the appropriate, and best-practice way of going about it.
But: For the vast majority of devices, this is a significant overkill. For a whole lot of devices, the needs just boil down to needing a single file in /dev, offering a single piece of functionality. And that is the use case for miscdevice.
The kernel offers us a convenient shortcut, through the misc subsystem (short for "miscellaneous"). misc is a pre-existing character device driver, that owns the major number 10, and is fully open for drivers to register themselves as minors under.
To see the current miscdevices running in our emulated system, run another ls with a grep again /dev:
~ # ls -l /dev | grep " 10,"
crw-r--r-- 1 0 0 10, 235 Jan 1 1970 autofs
crw------- 1 0 0 10, 257 May 22 11:23 cpu_dma_latency
crw------- 1 0 0 10, 183 Jan 1 1970 hwrng
crw------- 1 0 0 10, 237 Jan 1 1970 loop-control
crw------- 1 0 0 10, 256 Jan 1 1970 vga_arbiter
Just 5 of them registered nodes in our aggressively simple emulated system, but all of them registered under major number 10, and each one a small driver exposing a single /dev/foo node, on which they handle open/read/write/ioctl (input/output control) commands.
When registering a miscdevice, you provide the kernel with a name (which becomes /dev/<name>), an optional specific minor number, and your driver's file_operations (i.e., the callbacks for read/write/open/ioctl). The kernel takes care of things from there, and wires up all the steps described earlier - much easier to handle for us as kernel developers.
This is why misc exists - there are tons of reasons for needing a driver in the kernel, and many of them are quite small in scale. For these use cases, the entire device+driver registration and clean-up flow were just a bit too labor-intensive. So you can think of misc as a very convenient helper function from the kernel, to let you focus on the fun parts.
Some Neat Real-World Examples
As mentioned, there's an abundance of miscdevice examples out there, but our very simple emulated system doesn't ship with all that many of them. But a few fun ones to know from a more fully-fledged Linux system would be:
/dev/kvm: The entry-point to KVM (Kernel-based Virtual Machine). Open it andioctlon it to create Virtual Machines (how cool is that?)./dev/fuse: Userspace filesystem driver entrypoint./dev/uinput: Userspace input device emulation.
The list goes on, but they all share the property of simply needing the kernel to expose a simple control file, for the userspace to interact with. Typically, the real interface of these are ioctl, with read/write/open playing secondary roles, or no roles at all.
What We'll Be Building
With all of that covered, let's get into what we'll be building. Our goal for today, is to build the miscdevice driver /dev/rvcpu. rvcpu will only support read operations, and will offer access to some of the data that we read from the RISC-V registries in part 1. This isn't exactly the most true-to-real-life use case for misc, but it'll get us through some core concepts, and do so without things getting too hairy (I remind you that I'm actively learning these things as I write the posts, so things are kept simple and slow, mainly for my own sake).
Once we have our miscdevice driver wired up, we'll build a small userspace client for it in C, and define an ABI (Application Binary Interface) for how to interact with the driver.
Sounds good? Then let's get into the meat and potatoes of it all.
Step by Step
Just like in part 1, we'll be tackling this in small steps at a time, evaluating and learning as we go along.
Step 1: The Rust miscdevice sample code
In your local clone of the linux codebase, go have a look at samples/rust/rust_misc_device.rs. As a start, just have a quick glance at the code - it's much more feature-complete than what we'll end up having at the end of this post. This makes it both a good learning tool, and also a bit overwhelming.
One thing worth noticing, is the portion of the code where the author implements the MiscDevice interface on his RustMiscDevice class:
impl MiscDevice for RustMiscDevice {
type Ptr = Pin<KBox<Self>>;
fn open(_file: &File, misc: &MiscDeviceRegistration<Self>) -> Result<Pin<KBox<Self>>> {
…
}
fn read_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterDest<'_>) -> Result<usize> {
…
}
fn write_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterSource<'_>) -> Result<usize> {
…
}
fn ioctl(me: Pin<&RustMiscDevice>, _file: &File, cmd: u32, arg: usize) -> Result<isize> {
…
}
}
Ignoring the actual function implementations, you'll notice all of the pieces of functionality we talked about earlier: Open, read, write, ioctl. No matter how complicated or simple the driver, these remain the basic building-blocks of a miscdevice.
Step 2: Setting up our first miscdevice module
For our use case and learning purposes, we can settle for a lot less. For a start, create a directory named my-misc-device-module in your project directory from part 1. This will need the same Kbuild and Makefile as the previous modules (note that we're still building this as an out-of-tree kernel module, which registers a miscdevice when loaded.
Now copy the Rust miscdevice sample into your directory structure:
~ # cp /path/to/torvalds/linux/samples/rust/rust_misc_device.rs /path/to/your/project/my-misc-device-module/my_misc_device_module.rs
Now, we'll be drastically stripping down the example - for a start, we'll shave off everything pertaining to write_iter, ioctl, set_value, get_value and hello. So we'll just be leaving the implementations of open and read_iter in the codebase.
This little exercise is pretty dull (sorry), but when combined with our Makefile setup from part 1, it will give you a neat chance to see first-hand some of the strengths of the Rust compiler - as you chop away and re-build, the Rust compiler will keep letting you know where things are broken, and warn you about imports that are now obsolete. Keep going at it, until you have something that compiles - refer to the GitHub repo as needed.
Note: Make sure that you find the MiscDeviceOptions config, and rename the driver to rvcpu!
Step 3: Loading our device
Once you have everything compiling without complaints from the Rust compiler, go ahead and run the top-level Makefile, and launch your QEMU environment.
Just like last time, use insmod to load your new module - note that the name of the module, and the driver are separate entities, defined in separate portions of the codebase:
~ # insmod lib/modules/my_misc_device_module.ko
This should give you zero output, unless you made a pr_info! macro call in your module init code.
To validate the existence of the device, go ahead and run an ls command against /sys/class/misc/:
~ # ls -l /sys/class/misc/
total 0
lrwxrwxrwx 1 0 0 0 May 22 12:35 autofs -> ../../devices/virtual/misc/autofs
lrwxrwxrwx 1 0 0 0 May 22 12:35 cpu_dma_latency -> ../../devices/virtual/misc/cpu_dma_latency
lrwxrwxrwx 1 0 0 0 May 22 12:35 hw_random -> ../../devices/virtual/misc/hw_random
lrwxrwxrwx 1 0 0 0 May 22 12:35 loop-control -> ../../devices/virtual/misc/loop-control
lrwxrwxrwx 1 0 0 0 May 22 12:35 rvcpu -> ../../devices/virtual/misc/rvcpu
lrwxrwxrwx 1 0 0 0 May 22 12:35 vga_arbiter -> ../../devices/virtual/misc/vga_arbiter
And there you should see our rvcpu driver, loaded and ready to go.
However: If you were to look for it through running ls against /dev/, the reference point we used previously to look for devices, it wouldn't be there. Why is that?
Step 4: Registering the device fully
Well, there's layers to why it isn't showing up. When our module calls the MiscDeviceRegistration::register function, the kernel does what it's supposed to do: Allocates a minor number, adds an entry to its miscdevice table, makes /sys/class/misc/rvcpu/ appear, and is ready to handle opens on the major/minor pairing.
What it does however not do, is create the /dev/rvcpu file. This is because that file is a pure userspace concern - the /dev/rvcpu file is just a filesystem object. It's a special kind, as covered earlier, but still a construct in the userspace filesystem. So we need something in the userspace taking care of that half of the setup.
We'll go through the entire registration flow once, but to address that, we need to slightly expand our emulated system's capabilities, by expanding the amount of supported POSIX operations we have. So it's time to add some more symlinks to busybox.
Step 4.1: Supporting mknod in our emulated system
In your project directory, cd into your rootfs dir, then into the bin directory, and create a new symlink towards busybox for the mknod command:
~ # cd rootfs/bin
~ # ln -s busybox mknod
And just like that, we have the tool we need. Rebuild your initramfs, and restart your emulator. Remember to once again load the module with insmod, and validate that it still correctly shows up in /sys/class/misc/.
Step 4.2: Registering the device node
To use mknod, we need to supply it 4 inputs: The name of the device node we'd like to create (i.e., /dev/rvcpu), the type of device we're registering (i.e., a character device, denoted with a c), the major number (i.e., 10, because we're registering a miscdevice), and the minor driver number. We know the answer to all of these except the minor number.
Luckily, as established earlier, the kernel has already sorted this out for us - so we just need to go looking for it.
To find the assigned minor number of our driver, simply read from the rvcpu/dev entry in /sys/class/misc/:
~ # cat /sys/class/misc/rvcpu/dev
10:258
The minor number might vary on your machine, and that's all good - the kernel has just made sure that it was an available minor number. On my setup, the minor ended up as 258, which is what I'll be using in the example to come.
Now that we've been armed with the missing parameter for mknod, we can register our driver like so:
~ # mknod /dev/rvcpu c 10 258
And if that exits without complaining, you should be able to attempt an open and a read from the now-created /dev/rvcpu:
~ # cat /dev/rvcpu
[ 7657.164932] misc rvcpu: Opening Rust Misc Device Sample
[ 7657.166669] misc rvcpu: Reading from Rust Misc Device Sample
[ 7657.171795] misc rvcpu: Exiting the Rust Misc Device Sample
And that's the end-to-end registration flow, with our cat command triggering both the open, read_iter, and PinnedDrop implementations. No failures on the cat commands tells us that the empty buffer read returned cleanly with a 0 exit code. So far, so great!
Step 5: Making step 4 obsolete
For learning purposes, going through the mknod flow is all well and good. But now that we've been through it, let's make sure that there's something in our userspace that can take over this part of the flow, and finish the work that our kernel starts when the module is registered.
To handle the userspace side of miscdevice registration, we need to mount devtmpfs on boot. devtmpfs is a virtual file system, which handles the automatic population of device nodes. So by mounting it on /dev, it'll take care of all the device registrations in user space for us - pretty neat!
To mount it on boot, we need to modify our init script, which we wrote back in part 1. At the current point in time, it should look something like this:
#!/bin/sh
mount -t proc proc /proc
mount -t sysfs sysfs /sys
exec /bin/sh
At any point between the shebang and the launch of the interactive shell, add the command:
mount -t devtmpfs devtmpfs /dev
This will mount devtmpfs to our /dev directory, and outsource all the userspace-specific handling of registering to this handy, virtual file system.
Use your Makefile to rebuild your initramfs, and once again load your module with insmod. Now, running ls -l against /dev should greet you with a fully registered device node:
~ # ls -l /dev/ | grep " 10,"
crw-r--r-- 1 0 0 10, 235 Jan 1 1970 autofs
crw------- 1 0 0 10, 257 May 22 11:23 cpu_dma_latency
crw------- 1 0 0 10, 183 Jan 1 1970 hwrng
crw------- 1 0 0 10, 237 Jan 1 1970 loop-control
crw------- 1 0 0 10, 258 May 22 11:43 rvcpu
crw------- 1 0 0 10, 256 Jan 1 1970 vga_arbiter
And there it is, given the minor number 258! Great, now we can move on to having our node actually contain some data.
Step 6: Writing data to the node
Time for us to once again dive into our Rust code, and read some RISC-V-specific registries.
Step 6.1: Bringing back our RISC-V CSR macro
For a start, let's steal some of our own code. Head into the code for my_csr_module from part 1, and fetch out the read_csr! macro we defined in part 1. Add this somewhere at the top of your new miscdevice module, and remember to import the asm! macro as well:
use core::arch::asm; // For the `asm!` macro.
macro_rules! read_csr {
($csr:ident) => {{
let value: u64;
// SAFETY: reading a CSR is a pure read with no side effects
unsafe {
asm!(
concat!("csrr {0}, ", stringify!($csr)),
out(reg) value
);
}
value
}};
}
Alright, we're now once again armed with the ability to read RISC-V CSRs from our Rust kernel module. For this little project, we'll read and subsequently write the same 3 CSRs as in part 1: instret, cycle, and time. Let's wrap those in a data object, and make a simple function for reading out all 3:
#[repr(C)]
#[derive(Debug, Copy, Clone)]
struct RvcpuSnapshot {
time: u64,
cycle: u64,
instret: u64,
}
// SAFETY: `#[repr(C)]` struct of three `u64`s, no padding bytes, no interior mutability.
unsafe impl AsBytes for RvcpuSnapshot {}
const RVCPU_IOC_SNAPSHOT: u32 = _IOR::<RvcpuSnapshot>('|' as u32, 0x80);
#[cfg(target_arch = "riscv64")]
fn take_snapshot() -> RvcpuSnapshot {
RvcpuSnapshot {
time: read_csr!(time),
cycle: read_csr!(cycle),
instret: read_csr!(instret),
}
}
And for your imports, make sure you're including:
use kernel::{
…
transmute::AsBytes,
uaccess::UserSlice,
ioctl::_IOR,
…
}
This little chunk of code defines a struct for holding our CSR readings, and a function for producing an instance of the struct, by leveraging our read_csr! macro.
It's worth noting the architecture gate #[cfg(target_arch = "riscv64")] over the take_snapshot() function - this ensures that the Rust compiler will only compile the function when the target build architecture is RISC-V. As we discussed in part 1, it would also be viable to add the gate to the macro definition, but since this is the only context in the module where the macro is called, adding the gate here implicitly also excludes the macro from compilation (since macros are "expanded" during compilation). That being said, you could opt for a belt-and-suspenders approach, and architecture gate every single piece of code that should only be included for RISC-V contexts.
It's also worth noting the #[repr(C)] annotation on the RvcpuSnapshot struct. This tells the Rust compiler to store instances of this struct in memory, in a way that's compatible with the C programming language - this'll come in handy later, when we make our userspace client for interacting with our driver!
For now, just take note of but largely ignore the RVCPU_IOC_SNAPSHOT const, and the AsBytes implementation for the struct, as they'll become more relevant later on. Notice how we just like in part 1, make sure to comment any uses of the unsafe keyword with a SAFETY comment, to let it be known that we thought through the use of this "unsafe" bit of code.
Step 6.2: Understanding the structure of our device specification
Alright, we have all the pieces in place to read data from our RISC-V registries, now we "just" need to write them. When going through this step, I found myself needing to iterate a ton here, as getting to write to the KVVec buffer just wasn't playing ball with me. I'll be referencing only the final solution in this post, but just to let you know that some level of struggling is to be expected here, especially if you're pretty new to more low-level programming, like I both was and am.
Another way I'm slightly "cheating" as we're writing this here, is that I'm already now setting the code up for our eventual userspace client program. So there's a few refactorings that I'm cutting out of this post, largely because I've pretty much lost the overview on the journey the module codebase went through during that session of debugging and refactors.
With those disclaimers out of the way, let's get our writing setup done.
To summarize, what we're trying to achieve is to write the contents of our RvcpuSnapshot to the internal buffer of our miscdevice driver. Once that data is written, our open, read_iter, and ioctl implementations will be able to perform operations on the data - we'll keep things simple for now, and simply support reading and updating the snapshot.
Let's have a quick look at how our device construct actually exists in our codebase, and how it registers with the kernel:
#[pin_data]
struct RustMiscDeviceModule {
#[pin]
_miscdev: MiscDeviceRegistration<RustMiscDevice>,
}
impl kernel::InPlaceModule for RustMiscDeviceModule {
fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
pr_info!("Initialising Rust Misc Device Sample\n");
let options = MiscDeviceOptions {
name: c"rvcpu",
};
try_pin_init!(Self {
_miscdev <- MiscDeviceRegistration::register(options),
})
}
}
struct Inner {
buffer: KVVec<u8>,
}
#[pin_data(PinnedDrop)]
struct RustMiscDevice {
#[pin]
inner: Mutex<Inner>,
dev: ARef<Device>,
}
Worth noticing right out of the gate, is that we're leveraging a lot of convenient, built-in Rust-for-Linux libraries for handling the registration of our miscdevice. This simplifies our registration a fair bit, especially for a simple device like ours - we simply supply a name for the device in our options, and we're ready to register.
The main thing we're about to wrestle with, is our Inner struct; a KVVec of unsigned 8-bit integers. In Rust, KVVec is a type alias for the kernel's own Vec<T, KVmalloc>, specifically made for Linux kernel development - an analog std's Vec, adapted for kernel use with a specific allocator. It uses the (KVmalloc) allocator, which uses the kvmalloc allocation strategy, which first attempts the faster, physically-contiguous kmalloc, and falls back to virtually-contiguous vmalloc for larger allocations that can't be satisfied contiguously.
This Inner buffer is where we want to write our data. But we don't just want it to hold plain ol' text data. Since we plan on introducing a client program for the driver written in C, we want the data in the buffer to be written in a format understandable by both of the programs, across their differing programming languages - this time, we're opting for binary!
Note: As we go on from here, you'll notice us leveraging various pin annotations and macros. Understanding the full depth of what these tools provide us with is a bit outside of the scope of this project, but know that they serve an important role in kernel development, as it controls the way that these blocks of code or structs are placed in memory - they get structurally pinned. This is not super important for the learning outcome we're pursuing in this post, but it is very important for kernel development in general - I can only encourage you to dive deeper into it, if you're curious.
To achieve the functionality we want from our driver, we'll be implementing the open, read_iter, and ioctl functionalities, as defined in the RustMiscDevice interface. We'll be taking them step by step.
Step 6.3: Implementing open
Our open function will be called any time the userspace calls open("/dev/rvcpu"). This will cause the misc subsystem to dispatch the open syscall to our driver.
In your impl MiscDevice for RustMiscDevice block, we'll first be implementing the open function:
fn open(_file: &File, misc: &MiscDeviceRegistration<Self>) -> Result<Pin<KBox<Self>>> {
let dev = ARef::from(misc.device());
dev_info!(dev, "Opening Rust Misc Device Sample\n");
let mut buffer = KVVec::new();
#[cfg(target_arch = "riscv64")]
{
let snap = take_snapshot();
buffer.extend_from_slice(snap.as_bytes(), GFP_KERNEL)?;
}
KBox::try_pin_init(
try_pin_init! {
RustMiscDevice {
inner <- new_mutex!(Inner {
buffer: buffer,
}),
dev: dev,
}
},
GFP_KERNEL,
)
}
In this function we have 4 main points of interest, happening in order as the function executes:
- Our
take_snapshot()function is called, which executes our CSR-reading macro 3 times, which uses in-line assembly to read from our RISC-V machine's registries. It returns these readings as an instance of theRvcpuSnapshotstruct. - Remember when we added the
unsafe impl AsBytes for RvcpuSnapshot {}line of code? That was all leading up to this moment: We can now callas_bytes()on our snapshot, which will take our struct of threeu64values, and return a&[u8]view of the raw memory - taking up 24 bytes. - We then push these bytes into our allocated
KVVecbuffer, using the built-inextend_from_slice()function. - Finally, it all gets tied together. We'll again be slightly glossing over the nitty-gritty details, but:
KBox::try_pin_initallocates a kernel heap box big enough for our RustMiscDevice instance (which we have initiated with our extended buffer as theinnerfield. It then constructs it in place inside that allocation. The Mutex and the KVVec end up living inside one heap allocation, pinned to that address. The kernel returns this pinned box back through the misc subsystem. It'll be associated with the open file descriptor for the lifetime of the open.
Notice that we're once again architecture-gating our RISC-V-specific code. So in the event that this module was to be built for another target architecture, our driver would simply be empty, since the entire snapshot loading section would be skipped.
At the end of this function, after a successful open() syscall from the userspace, the kernel now holds a per-open RustMiscDevice instance. Future read() and ioctl() syscalls on that device will be dispatched against this specific instance.
Step 6.4: Implementing read_iter
Our read_iter implementation, will be called any time userspace dispatches a read() syscall. Our implementation looks like so:
fn read_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterDest<'_>) -> Result<usize> {
let me = kiocb.file();
dev_info!(me.dev, "Reading from Rust Misc Device Sample\n");
let inner = me.inner.lock();
// Read the buffer contents, taking the file position into account.
let read = iov.simple_read_from_buffer(kiocb.ki_pos_mut(), &inner.buffer)?;
Ok(read)
}
Bit less going on here, but 3 interesting points none the less:
- The
let me = kiocb.file();line returns a reference to the per-openRustMiscDeviceinstance we created in ouropen()implementation. The kernel keeps track of which instance to return to us. - Before reading from our
innerfield, we must calllock()on it. This is the idiomatic Rust-for-Linux pattern for any shared mutable state. Even though our specific module never mutates the buffer afteropen(), the mutex guards against any concurrent access, and ensures that the code is ready for possible future extension. - Finally, we read the content of the buffer.
simple_read_from_buffertakes the file position (kiocb.ki_pos) and the source buffer, and copies as many bytes as fit into the userspace destination (theIovIterDestparameter). It then advances the file position, and returns how many bytes were copied. We finish up the function by returning the read content.
There's some very elegant, clever functionalities going on here, and we unfortunately can't take credit for them - they all pertain to the tracking of the file position:
The first read syscall has ki_pos = 0, so it copies from offset 0 to the end of the buffer (24 bytes, due to our 3 x 8 bytes), advances ki_pos to 24, and returns 24. The second read syscall has ki_pos = 24, which is past the end of our 24-byte buffer, so it copies 0 bytes and returns 0. cat interprets the 0 return as EOF, and exits.
That's how cat /dev/rvcpu works correctly: Two read calls, with the helper handling the position bookkeeping for us.
Step 6.5: Implementing ioctl
This is where the real magic happens. In most of the miscdevice drivers out there, the heavy lifting and complex functionality will take place in ioctl, due to it being easily extensible through the handling of ioctl commands.
We'll once again be looking back at some of our early code in this module, where we defined the constant const RVCPU_IOC_SNAPSHOT: u32 = _IOR::<RvcpuSnapshot>('|' as u32, 0x80); - this defines an ioctl number, which we then map to a command in our ioctl implementation. Userspace clients that wish to interact with our driver, will need to know this ioctl number to interact with this piece of functionality. We'll dive into how to ensure that a bit later.
Let's implement handling for our RVCPU_IOC_SNAPSHOT command:
fn ioctl(me: Pin<&RustMiscDevice>, _file: &File, cmd: u32, arg: usize) -> Result<isize> {
dev_info!(me.dev, "IOCTL on Rust Misc Device Sample (cmd: {})\n", cmd);
match cmd {
RVCPU_IOC_SNAPSHOT => {
#[cfg(target_arch = "riscv64")]
{
let snap = take_snapshot();
let user_arg = UserPtr::from_addr(arg);
let size = core::mem::size_of::<RvcpuSnapshot>();
UserSlice::new(user_arg, size)
.writer()
.write::<RvcpuSnapshot>(&snap)?;
}
Ok(0)
}
_ => {
dev_err!(me.dev, "Unrecognised IOCTL command: {}\n", cmd);
Err(ENOTTY)
}
}
}
In this function, we set up a pattern matching block on the cmd parameter, which holds an ioctl number. We're just looking to handle our one piece of functionality (to update the snapshot without dispatching another open syscall), so that's the only pattern we match against - everything else returns a ENOTTY ("not a typewriter" - historical error convention for when a device doesn't recognize the ioctl command given to it).
Our function goes through the following steps as it executes:
- Executes the pattern-matching against the
cmdparameter. - In our architecture-gate, we create a new snapshot.
- Now comes the tricky part: We want to return this snapshot to userspace. We can't interact directly with
arg, since it's a userspace construct, and we're working at the kernel-level - userspace memory is not to be trusted, as it is both very fluid, and can be tampered with for malicious reasons. Luckily, the kernel contains special functions for handling these scenarios in a safe manner. To use these, first we create a newUserPtrfrom theargparameter, to get a pointer to userspace. We then use one of the Rust wrappers for the special kernel functions,UserSlice::new(user_arg, size).writer().write::<RvcpuSnapshot>(&snap), to do the following: a. Wrap the userspace pointer in aUserSliceof the declared size. b. Ask for a writer, i.e., a one-shot sink that writes to our allocated userspace memory. c. Callwrite::<T>, where our snapshot falls under theT: AsBytesbound, due to us implementingAsBytespreviously.writeknows how to work with our snapshot as bytes, and how to move it across the boundary from the kernel, and into userspace. d. If the userspace pointer is invalid, the helper returns an error rather than crashing the kernel. The?propagates that error to userspace, as the return value of theioctlcall.
And that's it! That's our entire implementation of our miscdevice. Now let's take it for a spin.
Step 7: Testing out the driver
By now, you should have the following implementations for your driver:
#[vtable]
impl MiscDevice for RustMiscDevice {
type Ptr = Pin<KBox<Self>>;
fn open(_file: &File, misc: &MiscDeviceRegistration<Self>) -> Result<Pin<KBox<Self>>> {
let dev = ARef::from(misc.device());
dev_info!(dev, "Opening Rust Misc Device Sample\n");
let mut buffer = KVVec::new();
#[cfg(target_arch = "riscv64")]
{
let snap = take_snapshot();
buffer.extend_from_slice(snap.as_bytes(), GFP_KERNEL)?;
}
KBox::try_pin_init(
try_pin_init! {
RustMiscDevice {
inner <- new_mutex!(Inner {
buffer: buffer,
}),
dev: dev,
}
},
GFP_KERNEL,
)
}
fn read_iter(mut kiocb: Kiocb<'_, Self::Ptr>, iov: &mut IovIterDest<'_>) -> Result<usize> {
let me = kiocb.file();
dev_info!(me.dev, "Reading from Rust Misc Device Sample\n");
let inner = me.inner.lock();
// Read the buffer contents, taking the file position into account.
let read = iov.simple_read_from_buffer(kiocb.ki_pos_mut(), &inner.buffer)?;
Ok(read)
}
fn ioctl(me: Pin<&RustMiscDevice>, _file: &File, cmd: u32, arg: usize) -> Result<isize> {
dev_info!(me.dev, "IOCTL on Rust Misc Device Sample (cmd: {})\n", cmd);
match cmd {
RVCPU_IOC_SNAPSHOT => {
#[cfg(target_arch = "riscv64")]
{
let snap = take_snapshot();
let user_arg = UserPtr::from_addr(arg);
let size = core::mem::size_of::<RvcpuSnapshot>();
UserSlice::new(user_arg, size)
.writer()
.write::<RvcpuSnapshot>(&snap)?;
}
Ok(0)
}
_ => {
dev_err!(me.dev, "Unrecognised IOCTL command: {}\n", cmd);
Err(ENOTTY)
}
}
}
}
#[pinned_drop]
impl PinnedDrop for RustMiscDevice {
fn drop(self: Pin<&mut Self>) {
dev_info!(self.dev, "Exiting the Rust Misc Device Sample\n");
}
}
impl RustMiscDevice {
}
If that is all compiling without complaints, we'll need to quickly add another symlink to our initramfs. Currently, we have no tool that'll let us actually read the binary output in human-readable format. We'll call on hexdump to scratch that itch for us.
In your rootfs dir, run:
~ # cd bin
~ # ln -s busybox hexdump
Repackage your initramfs and rebuild your module through your Makefile, and let's try messing around with our driver:
~ # insmod lib/modules/my_misc_device_module.ko
[ 8.298482] my_misc_device_module: loading out-of-tree module taints kernel.
[ 8.334272] rust_misc_device: Initialising Rust Misc Device Sample
~ # cat /dev/rvcpu | hexdump -C
[ 18.676697] misc rvcpu: Opening Rust Misc Device Sample
[ 18.683358] misc rvcpu: Reading from Rust Misc Device Sample
[ 18.687349] misc rvcpu: Reading from Rust Misc Device Sample
[ 18.688903] misc rvcpu: Exiting the Rust Misc Device Sample
00000000 5b 44 78 0b 00 00 00 00 9a 8c f1 39 ae 99 06 00 |[Dx........9....|
00000010 c2 3d f3 39 ae 99 06 00 |.=.9....|
00000018
Hopefully, you're seeing at least somewhat similar output from your emulator, because this is exactly what we've been chasing! We have the console outputs from our driver, and more importantly, we have a neat little lineup of hexadecimal values (hexdump making the binary reading a bit easier on us).
Now that we have everything working on the driver side of things, let's introduce a client program to be executed in userspace.
Step 8: Defining our ABI
For our userspace application to talk to our kernel-level driver, they need a shared understanding of which actions the driver supports, and what the expected return value from those actions look like. Most developers, and a whole heap of non-developers as well, will be familiar with the term API: Application Programming Interface. An interface exposing parts of a system's functionality, along with a "contract" specifying how to communicate with a system. An Application Binary Interface (ABI) is the same general idea, but since things are happening at a very low-level point on shared hardware, the communication format ends up being binary - so a pretty familiar concept.
To get started on our client programme, we'll first make a new directory in our project directory:
~ # mkdir rvcpu_client
In there, we'll be making a header file: rvcpu_uapi.h. This file will contain definitions for our ioctl commands, and our RvcpuSnapshot struct. Add the following C code to the file:
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _RVCPU_UAPI_H
#define _RVCPU_UAPI_H
#include <linux/ioctl.h>
#include <linux/types.h>
struct rvcpu_snapshot {
__u64 time;
__u64 cycle;
__u64 instret;
};
#define RVCPU_IOC_MAGIC '|'
#define RVCPU_IOC_SNAPSHOT _IOR(RVCPU_IOC_MAGIC, 0x80, struct rvcpu_snapshot)
#endif /* _RVCPU_UAPI_H */
A few conventions worth knowing for kernel development, that are visible in this file:
__u64with two underscores, is the kernel-userspace-portable type from<linux/types.h>. It is always 64 bits, leaving no room for surprises across different compilers.- The
_IORmacro from<linux/ioctl.h>is the C twin of our Rust_IOR<T>import, which we used to define ourioctlcommand. Same bit layout, and same direction encoding. - The
RVCPU_IOC_MAGICdefinition, defines the "magic number" of ourioctlnumber (i.e., our command). It's a mechanism used by the kernel community, to partition the namespace ofioctlnumbers, to avoid collisions betweenioctlnumbers - if this were a proper upstream driver, we would have to go through the kernel docs, and find an unused "magic byte". Coincidentally, the magic we configured here, is partially claimed bylinux/media.hfornrvalues 0x00-0x7F. Luckily, ournr= 0x80 falls outside that range, so we're not actually colliding. Had this however been upstream-bound, the convention would be to pick a different magic byte entirely to avoid even the appearance of conflict.
Alright, that's our ABI - now on the client using it!
Step 9: Our userspace client
For the userspace client, we'll once again be turning to C as our language of choice. The client we're writing will simply be exercising the open, read, and ioctl syscalls against the driver, so nothing too crazy going on. For really exercising our ioctl command, we'll be requesting multiple fresh snapshots on the same fd (reference to the device after opening it), in a loop with a short sleep in between.
In the same directory as our ABI specification, go ahead and create a file named rvcpu_client.c. The client code will end up looking something like this:
/* SPDX-License-Identifier: GPL-2.0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include "rvcpu_uapi.h"
static void print_snapshot(const char *label, const struct rvcpu_snapshot *s)
{
printf("%s: time=%llu cycle=%llu instret=%llu\n",
label,
(unsigned long long)s->time,
(unsigned long long)s->cycle,
(unsigned long long)s->instret);
}
int main(void)
{
int fd = open("/dev/rvcpu", O_RDONLY);
if (fd < 0) {
perror("open /dev/rvcpu");
return 1;
}
/* Path 1: read() returns the snapshot taken at open() time. */
struct rvcpu_snapshot snap;
ssize_t n = read(fd, &snap, sizeof(snap));
if (n != (ssize_t)sizeof(snap)) {
fprintf(stderr, "short read: got %zd, expected %zu\n", n, sizeof(snap));
close(fd);
return 1;
}
print_snapshot("open-time read()", &snap);
/* Path 2: ioctl() takes a fresh snapshot each call. */
for (int i = 0; i < 5; i++) {
struct rvcpu_snapshot fresh;
if (ioctl(fd, RVCPU_IOC_SNAPSHOT, &fresh) < 0) {
perror("ioctl RVCPU_IOC_SNAPSHOT");
close(fd);
return 1;
}
char label[32];
snprintf(label, sizeof(label), "ioctl #%d", i);
print_snapshot(label, &fresh);
/* tiny busy delay so successive snapshots differ */
for (volatile int j = 0; j < 1000000; j++);
}
close(fd);
return 0;
}
One of the longer code-snippets we've introduced so far, but there's luckily just a few points of main interest in it. I won't be going deep into the C semantics of it all, both due to it being outside the scope of what we're doing here, and due to my C knowledge being incredibly surface-level at absolute best. But still, let's dissect the code, just for a bit:
#include "rvcpu_uapi.h"- where we import our ABI, and the types defined in it.int fd = open("/dev/rvcpu", O_RDONLY);- theopensyscall being exercised.- The
struct rvcpu_snapshot snap; ssize_t n = read(fd, &snap, sizeof(snap));readsyscall being dispatched, and the return value being bound to ourrvcpu_snapshotdefinition from the.hfile. - All of this takes place in our loop, and is where we exercise the
struct rvcpu_snapshot fresh; if (ioctl(fd, RVCPU_IOC_SNAPSHOT, &fresh) < 0) { perror("ioctl RVCPU_IOC_SNAPSHOT"); close(fd); return 1; }ioctlsyscall. Theioctlfunction takes the driver reference, anioctlnumber, and a pointer to where to put the output - that's where both programs agreeing upon what's being sent back and forwards gets really important. Just like before, we're assigning to our agreed uponrvcpu_snapshotstruct, and getting ourRVCPU_IOC_SNAPSHOTioctlnumber from our.hfile. - Finally, we call
close()on thefd. Once the last reference to the open file is gone, thePinnedDropimplementation in our Rust code is triggered, freeing up any allocated memory.
A small delta worth noting, is that this is why we tagged our Rust struct with the #[repr(C)] annotation. Like we briefly covered at the time, this ensures that the Rust compiler will store instances of this struct in a C-friendly format, which is part of facilitating this cross-boundary, cross-language communication.
And that's all there is to it! Let's put it all together, and start getting this session wrapped up.
Step 10: Putting it all together
In part 1, we installed quite a few tools - one of them was a C cross-compiler for RISC-V, named riscv64-linux-gnu-gcc. In case you don't have it installed, it's time to do so, as we'll need it to compile our C client.
In your rvcpu_client directory, go ahead and run:
riscv64-linux-gnu-gcc -static -O2 -Wall -Wextra -o rvcpu_client rvcpu_client.c
file rvcpu_client
If everything went well, your file command should return something looking roughly like:
rvcpu_client: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), statically linked, BuildID[sha1]=95654ddbfa198ec3088b244458adc5498e36decb, for GNU/Linux 4.15.0, with debug_info, not stripped
If that's the case, cross-compilation worked, and you have yourself a RISC-V executable.
Copy that executable into your rootfs's bin directory, rebuild your initramfs, and launch your emulator. You'll want to once again load the kernel module using insmod, and then run the client programme:
~ # insmod lib/modules/my_misc_device_module.ko
[ 50.592436] my_misc_device_module: loading out-of-tree module taints kernel.
[ 50.630363] rust_misc_device: Initialising Rust Misc Device Sample
~ # rvcpu_client
[ 55.529992] misc rvcpu: Opening Rust Misc Device Sample
[ 55.532903] misc rvcpu: Reading from Rust Misc Device Sample
open-time read(): time=560769893 cycle=2110735679848996 instret=2110735679852672
[ 55.538702] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #0: time=560879585 cycle=2110735679848996 instret=2110735679852672
[ 55.559247] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #1: time=561062739 cycle=2110735679848996 instret=2110735679852672
[ 55.575940] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #2: time=561228227 cycle=2110735679848996 instret=2110735679852672
[ 55.590878] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #3: time=561377821 cycle=2110735679848996 instret=2110735679852672
[ 55.605857] misc rvcpu: IOCTL on Rust Misc Device Sample (cmd: 2149088384)
ioctl #4: time=561527599 cycle=2110735679848996 instret=2110735679852672
[ 55.620967] misc rvcpu: Exiting the Rust Misc Device Sample
And that right there, is what we were after 10 steps ago!
The output tells the full story: Our client opens the driver, reads from it, and then gets 5 fresh readings in a loop. You can even see the small increments in the time CSR value as we go through the loops.
Note: The cycle and instret values are frozen in time - this is a side-effect from our emulator setup. On real hardware, these would be changing as well.
And that about does it - it's not much as we're looking at it in the terminal, but building it from scratch gives a completely different perspective of how much is actually going on.
How Everything Pieces Together
I thought it might be beneficial to quickly circle back, and look at both what we built, and how it all connects.
The full line-up of what we made:
- A
miscdevicedriver, written in Rust for the Linux kernel. - A header file for userspace clients to know how to interact with our kernel-level driver.
- A client programme that consumes that ABI, and uses it to interact with our driver through the
open,read, andioctlsyscalls.
Pretty productive, and a bunch of interesting steps forward from our part 1 kernel module.
Now, just to close the loop on everything, this is how everything pieces together:
- On startup, we load our kernel module.
- The kernel module registers a driver under
miscdevice. - The kernel handles registering the driver in its internal
miscdeviceindex. The driver now shows up in/sys/class/misc/. - The virtual filesystem we mounted in our
initscript,devtmpfs, handles creating the device node. The driver now shows up in/dev/. - We execute our client application. It uses the provided header file to know how to communicate with the driver, and how to handle the data that comes back from the driver. The
#[repr(C)]annotation on our Rust struct and theunsafe impl AsBytestogether let the kernel safely view our snapshot as raw bytes, while ensuring the bytes match the C struct layout the userspace client expects. - The client application dispatches the syscalls we implemented, making our driver read from the RISC-V CSRs through in-line assembly code, returning it back to userspace in a safe manner.
- The driver is closed at the exit of the application.
Incredible stuff once it's all written down like this! We've touched a lot of moving parts in terms of how the kernel works, and how data moves between the kernel and userspace.
Thanks for reading along this far if you did - I had a lot of headaches and a lot of fun making this, and I hope it was of some sort of value to you as a reader.
Concluding thoughts
This was a slightly heavy, but also very natural-feeling next step in my little learning journey regarding RISC-V, Rust, and the Linux kernel. I hope that anyone following along with this series might feel the same way.
Going forward, I think I will be pursuing some slight "spin-offs" (spiritual successors, if you will), rather than immediately going into part 3. I can almost guarantee that part 3 will pertain to working with the device tree, so stay tuned for that when the time comes. If you're keen on some of the upcoming still-RISC-V-flavoured guides, then that's even better - I'm itching to get started on them.
As always, thank you for your time!