Difference between revisions of "Virtual I/O Internals"
(→Set IRQs: Improved IRQ/IOCTL formatting.) |
|||
Line 23: | Line 23: | ||
== Both Modes == | == Both Modes == | ||
This section will cover concepts which apply both to [https://openmdev.io/index.php/Mediated_Device_Internals#RPC_Mode RPC Mode] and [https://openmdev.io/index.php/Mediated_Device_Internals#SR-IOV_Mode SR-IOV Mode]. | This section will cover concepts which apply both to [https://openmdev.io/index.php/Mediated_Device_Internals#RPC_Mode RPC Mode] and [https://openmdev.io/index.php/Mediated_Device_Internals#SR-IOV_Mode SR-IOV Mode].[[File:Ioeventfd-and-irqfd.png|thumb|'''Figure 0:''' A simple diagram of signalling from host to guest (via irqfd) & guest to host (via ioeventfd) from [http://blog.allenx.org/2015/07/05/kvm-irqfd-and-ioeventfd blog.allenx]]] | ||
=== Binding VFIO devices === | ===Binding VFIO devices=== | ||
[[File:Vfio-pci driver bindings.png|thumb|'''Figure 1:''' VFIO group nodes are unit of ownership that VFIO uses.]] | [[File:Vfio-pci driver bindings.png|thumb|'''Figure 1:''' VFIO group nodes are unit of ownership that VFIO uses.]] | ||
[[File:IOCTL set VFIO container.png|thumb|'''Figure 2:''' IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER) places the VFIO Group inside the VFIO Container.]] | [[File:IOCTL set VFIO container.png|thumb|'''Figure 2:''' IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER) places the VFIO Group inside the VFIO Container.]] | ||
Line 43: | Line 39: | ||
'''Figure 2:''' The interrupt routine '''<code>IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER)</code>''' places the VFIO group inside the VFIO container. | '''Figure 2:''' The interrupt routine '''<code>IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER)</code>''' places the VFIO group inside the VFIO container. | ||
=== Programming the IOMMU=== | ===Programming the IOMMU=== | ||
When this has been done '''<code>IOCTL(CONTAINER, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)</code>''' can then be used to set an IOMMU type for the container which places it in a user interact-able state. | When this has been done '''<code>IOCTL(CONTAINER, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)</code>''' can then be used to set an IOMMU type for the container which places it in a user interact-able state. | ||
Line 51: | Line 47: | ||
'''Figure 3:''' Once the VFIO Groups have been placed inside the VFIO container and the IOMMU type has been set the user may then map and unmap which will automatically inserts Memory Mapped IO (MMIO) entries into the IOMMU as well as pin/unpin pages as necessary. This can be accomplished using '''<code>IOCTL(CONTAINER, VFIO_IOMMU_MAP_DMA, &MAP)</code>''' for map/pin and '''<code>IOCTL(CONTAINER, VFIO_IOMMU_UNMAP_DMA, &UNMAP)</code>''' for unmap/unpin. | '''Figure 3:''' Once the VFIO Groups have been placed inside the VFIO container and the IOMMU type has been set the user may then map and unmap which will automatically inserts Memory Mapped IO (MMIO) entries into the IOMMU as well as pin/unpin pages as necessary. This can be accomplished using '''<code>IOCTL(CONTAINER, VFIO_IOMMU_MAP_DMA, &MAP)</code>''' for map/pin and '''<code>IOCTL(CONTAINER, VFIO_IOMMU_UNMAP_DMA, &UNMAP)</code>''' for unmap/unpin. | ||
===Getting the VFIO Group File Descriptor=== | === Getting the VFIO Group File Descriptor === | ||
'''Figure 4:''' Once the device has been bound to a VFIO driver, set in a VFIO container, the VFIO container has it's IOMMU type set, and a memory map/page pin of the VFIO device has been completed a file descriptor can then be obtained for the device. This file descriptor can be used for interrupts (ioctls), to probe for information about the BAR regions, and configure the IRQs. | '''Figure 4:''' Once the device has been bound to a VFIO driver, set in a VFIO container, the VFIO container has it's IOMMU type set, and a memory map/page pin of the VFIO device has been completed a file descriptor can then be obtained for the device. This file descriptor can be used for interrupts (ioctls), to probe for information about the BAR regions, and configure the IRQs. | ||
===VFIO device file descriptor=== | ===VFIO device file descriptor === | ||
VFIO device file descriptors are divided into regions and each region is mapped into a device resource. Region count and info (file offset, allowable access, ect..) can be discovered through interrupt (IOCTL). Each file descriptor region corresponding to a PCI resource is represented as a file offset. | VFIO device file descriptors are divided into regions and each region is mapped into a device resource. Region count and info (file offset, allowable access, ect..) can be discovered through interrupt (IOCTL). Each file descriptor region corresponding to a PCI resource is represented as a file offset. | ||
Line 66: | Line 62: | ||
|Region 0 Bar0 (starts at offset 0) | |Region 0 Bar0 (starts at offset 0) | ||
|- | |- | ||
|Region 1 Bar1 (MSI) | |Region 1 Bar1 (MSI) | ||
|- | |- | ||
|Region 2 Bar2 (MSIX) | |Region 2 Bar2 (MSIX) | ||
Line 84: | Line 80: | ||
|- | |- | ||
!0 -> A | !0 -> A | ||
!A -> (A+B) | ! A -> (A+B) | ||
!(A+B) -> (A+B+C) | !(A+B) -> (A+B+C) | ||
!(A+B+C) -> (A+B+C+D) | !(A+B+C) -> (A+B+C+D) | ||
Line 96: | Line 92: | ||
|} | |} | ||
===VFIO Interrupts === | ===VFIO Interrupts=== | ||
Guests communicate with the host via VFIO Interrupt Requests ([https://infogalactic.com/info/Interrupt_request_(PC_architecture) IRQs]). These are sent via an irqfd (IRQ [https://infogalactic.com/info/File_descriptor File Descriptor]). Similarly, the host receives these interrupts via [https://man7.org/linux/man-pages/man2/eventfd.2.html eventfd] (Event File Descriptor). The resulting data can be returned via a [https://infogalactic.com/info/Callback_(computer_programming) callback]. | Guests communicate with the host via VFIO Interrupt Requests ([https://infogalactic.com/info/Interrupt_request_(PC_architecture) IRQs]). These are sent via an irqfd (IRQ [https://infogalactic.com/info/File_descriptor File Descriptor]). Similarly, the host receives these interrupts via [https://man7.org/linux/man-pages/man2/eventfd.2.html eventfd] (Event File Descriptor). The resulting data can be returned via a [https://infogalactic.com/info/Callback_(computer_programming) callback]. | ||
Line 102: | Line 98: | ||
Device properties discovered via interrupt (IOCTL). | Device properties discovered via interrupt (IOCTL). | ||
===== Get Device Info ===== | =====Get Device Info===== | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
! colspan="3" |VFIO_DEVICE_GET_INFO | ! colspan="3" |VFIO_DEVICE_GET_INFO | ||
|- | |- | ||
| colspan="3" | struct vfio_device_info | | colspan="3" |struct vfio_device_info | ||
|- | |- | ||
| rowspan="7" | | | rowspan="7" | | ||
Line 113: | Line 109: | ||
| | | | ||
|- | |- | ||
|flags | |flags | ||
| | | | ||
|- | |- | ||
| rowspan="3" | | | rowspan="3" | | ||
|VFIO_DEVICE_FLAGS_PCI | |VFIO_DEVICE_FLAGS_PCI | ||
|- | |- | ||
|VFIO_DEVICE_FLAGS_PLATFORM | |VFIO_DEVICE_FLAGS_PLATFORM | ||
|- | |- | ||
|VFIO_DEVICE_FLAGS_RESET | |VFIO_DEVICE_FLAGS_RESET | ||
|- | |- | ||
|num_irqs | |num_irqs | ||
Line 131: | Line 127: | ||
The IRQ '''<code>VFIO_DEVICE_GET_INFO</code>''' can provide information to distinguish between PCI and platform devices as well as the number of regions and IRQs for a particular device. | The IRQ '''<code>VFIO_DEVICE_GET_INFO</code>''' can provide information to distinguish between PCI and platform devices as well as the number of regions and IRQs for a particular device. | ||
===== Get Region Info ===== | =====Get Region Info===== | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
! colspan="3" |VFIO_DEVICE_GET_REGION_INFO | ! colspan="3" |VFIO_DEVICE_GET_REGION_INFO | ||
|- | |- | ||
| colspan="3" | struct vfio_region_info | | colspan="3" |struct vfio_region_info | ||
|- | |- | ||
| rowspan="10" | | | rowspan="10" | | ||
Line 151: | Line 147: | ||
|VFIO_REGION_INFO_FLAG_CAPS | |VFIO_REGION_INFO_FLAG_CAPS | ||
|- | |- | ||
|VFIO_REGION_INFO_FLAG_MMAP | |VFIO_REGION_INFO_FLAG_MMAP | ||
|- | |- | ||
| VFIO_REGION_INFO_FLAG_READ | |VFIO_REGION_INFO_FLAG_READ | ||
|- | |- | ||
|VFIO_REGION_INFO_FLAG_WRITE | |VFIO_REGION_INFO_FLAG_WRITE | ||
|- | |- | ||
|index | | index | ||
| | | | ||
|- | |- | ||
Line 163: | Line 159: | ||
| | | | ||
|- | |- | ||
| size | |size | ||
| | | | ||
|} | |} | ||
Once the interrupt user knows the number of regions within a VFIO device they can use IRQ '''<code>VFIO_DEVICE_GET_REGION_INFO</code>''' to probe each region for additional information. This interrupt will return information such as if it can be read from or written to, if the device supports MMAP, as well as what the offset and size of the region is within the VFIO file descriptor. | Once the interrupt user knows the number of regions within a VFIO device they can use IRQ '''<code>VFIO_DEVICE_GET_REGION_INFO</code>''' to probe each region for additional information. This interrupt will return information such as if it can be read from or written to, if the device supports MMAP, as well as what the offset and size of the region is within the VFIO file descriptor. | ||
===== Get IRQ Info ===== | ===== Get IRQ Info===== | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
! colspan="3" |VFIO_DEVICE_GET_IRQ_INFO | ! colspan="3" | VFIO_DEVICE_GET_IRQ_INFO | ||
|- | |- | ||
| colspan="3" |struct vfio_irq_info | | colspan="3" |struct vfio_irq_info | ||
|- | |- | ||
| rowspan="8" | | | rowspan="8" | | ||
Line 186: | Line 182: | ||
|- | |- | ||
| rowspan="4" | | | rowspan="4" | | ||
| VFIO_IRQ_INFO_AUTOMASKED | |VFIO_IRQ_INFO_AUTOMASKED | ||
|- | |- | ||
|VFIO_IRQ_INFO_EVENTFD | |VFIO_IRQ_INFO_EVENTFD | ||
|- | |- | ||
|VFIO_IRQ_INFO_MASKABLE | | VFIO_IRQ_INFO_MASKABLE | ||
|- | |- | ||
|VFIO_IRQ_INFO_NORESIZE | |VFIO_IRQ_INFO_NORESIZE | ||
|- | |- | ||
|index | |index | ||
| | | | ||
|} | |} | ||
'''<code>VFIO_IRQ_INFO_AUTOMASKED</code>''' is used to mask interrupts when they occur to protect the host. | '''<code>VFIO_IRQ_INFO_AUTOMASKED</code>''' is used to mask interrupts when they occur to protect the host. | ||
===== Set IRQs ===== | =====Set IRQs===== | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
Line 207: | Line 203: | ||
|- | |- | ||
| rowspan="12" | | | rowspan="12" | | ||
|argz | |argz | ||
| | | | ||
|- | |- | ||
Line 245: | Line 241: | ||
===Instruction Execution=== | ===Instruction Execution=== | ||
RPC Mode moves instruction information across a virtual function | RPC Mode moves instruction information across a virtual function (VF) device using [https://infogalactic.com/info/Remote_procedure_call Remote Procedure Calls] generally by way of [https://infogalactic.com/info/Interrupt soft interrupt] (IOCTLs). These signals may be passed over file descriptors such as irqfd and ioeventfd. | ||
The Interrupt Request File Descriptor (irqfd) may be used to signal from the host into the guest whereas the I/O Event File Descriptor (ioeventfd) may be used to signal from the guest into the host. Guest GPU instructions passed from the guest as Remote Procedure Calls are [https://infogalactic.com/info/Just-in-time_compilation Just-in-time] recompiled on the host for execution by a device driver. | |||
====IRQ remapping==== | ====IRQ remapping==== | ||
Interrupt Requests (IRQs) must be remapped (trapped for virtualized execution) to protect the host from sensitive instructions which may affect global memory state. | Interrupt Requests (IRQs) must be remapped (trapped for virtualized execution) to protect the host from sensitive instructions which may affect global memory state. | ||
=== Memory Management === | ===Memory Management === | ||
====Region Passthrough==== | ====Region Passthrough==== | ||
Guests may be presented with emulated memory regions which use indirect emulated communication requiring a VM-exit (slow) or instead the guest may be presented with passthrough memory regions which use direct communication requiring no VM-exit (fast). | Guests may be presented with emulated memory regions which use indirect emulated communication requiring a VM-exit (slow) or instead the guest may be presented with passthrough memory regions which use direct communication requiring no VM-exit (fast). | ||
====EPT Page Violations==== | ==== EPT Page Violations==== | ||
Guest [https://infogalactic.com/info/Memory-mapped_I/O Memory Mapped IO (MMIO)] tripped Extended Page Table (EPT) violations which are trapped by the host MMU. KVM services EPT violations and forwards to QEMU VFIO PCI driver. QEMU then converts the request from KVM to R/W access to the [https://infogalactic.com/info/File_descriptor Mdev File Descriptor (FD)]. Reads and writes are then handled by the host GPU device driver via mediated [https://infogalactic.com/info/Callback_(computer_programming) callbacks (CBs)] and [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst VFIO-mdev]. | Guest [https://infogalactic.com/info/Memory-mapped_I/O Memory Mapped IO (MMIO)] tripped Extended Page Table (EPT) violations which are trapped by the host MMU. KVM services EPT violations and forwards to QEMU VFIO PCI driver. QEMU then converts the request from KVM to R/W access to the [https://infogalactic.com/info/File_descriptor Mdev File Descriptor (FD)]. Reads and writes are then handled by the host GPU device driver via mediated [https://infogalactic.com/info/Callback_(computer_programming) callbacks (CBs)] and [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst VFIO-mdev]. | ||
===Scheduling === | ===Scheduling=== | ||
Scheduling is handled by the host mdev driver. | Scheduling is handled by the host mdev driver. | ||
Line 272: | Line 269: | ||
==SR-IOV Mode== | ==SR-IOV Mode== | ||
===Instruction Execution=== | === Instruction Execution=== | ||
SR-IOV Mode involves the communication of instructions from a virtual function (VF) through direct communication to the [https://infogalactic.com/info/PCI_configuration_space PCI BAR]. | SR-IOV Mode involves the communication of instructions from a virtual function (VF) through direct communication to the [https://infogalactic.com/info/PCI_configuration_space PCI BAR]. | ||
===Memory Management === | ===Memory Management=== | ||
Guests are presenting with passthrough memory regions by the device firmware. | Guests are presenting with passthrough memory regions by the device firmware. | ||
===Scheduling=== | === Scheduling=== | ||
Scheduling may be handled by the host mdev driver and/or the device firmware. | Scheduling may be handled by the host mdev driver and/or the device firmware. | ||
Line 289: | Line 286: | ||
HPA<->GPA Boundary Enforcement. | HPA<->GPA Boundary Enforcement. | ||
== Talks & Reading Material == | ==Talks & Reading Material== | ||
[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/auxiliary_bus.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 Auxiliary Bus] | [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/auxiliary_bus.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 Auxiliary Bus] | ||
Revision as of 15:07, 27 April 2022
The following document will attempt to detail the internals of a Virtual Function IO (VFIO) driven Mediated Device (Mdev).
RPC Mode | SR-IOV Mode |
---|---|
Host requires insight about guest of workload. | Host ignorance of guest workload. |
Error reporting. | No guest driver error reporting. |
In depth dynamic monitoring. | Basic dynamic monitoring. |
Software defined MMU guest separation. | Firmware defined MMU guest separation. |
Requires deferred instructions to be supported by host software (support libraries). | Guest is ignorant of host supported software such as support libraries. |
Both Modes
This section will cover concepts which apply both to RPC Mode and SR-IOV Mode.

Binding VFIO devices
Figure 1: Binding devices to the vfio-pci driver results in VFIO group nodes.
Opening the file "/dev/vfio/vfio" creates a VFIO Container.
Figure 2: The interrupt routine IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER)
places the VFIO group inside the VFIO container.
Programming the IOMMU
When this has been done IOCTL(CONTAINER, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)
can then be used to set an IOMMU type for the container which places it in a user interact-able state.
Once this IOMMU type state has been set and the VFIO container has been made interact-able additional VFIO groups may be added to the container without requiring that the group's IOMMU type be set again as newly added groups automatically inherit the container's IOMMU context.
VFIO Memory Mapped IO
Figure 3: Once the VFIO Groups have been placed inside the VFIO container and the IOMMU type has been set the user may then map and unmap which will automatically inserts Memory Mapped IO (MMIO) entries into the IOMMU as well as pin/unpin pages as necessary. This can be accomplished using IOCTL(CONTAINER, VFIO_IOMMU_MAP_DMA, &MAP)
for map/pin and IOCTL(CONTAINER, VFIO_IOMMU_UNMAP_DMA, &UNMAP)
for unmap/unpin.
Getting the VFIO Group File Descriptor
Figure 4: Once the device has been bound to a VFIO driver, set in a VFIO container, the VFIO container has it's IOMMU type set, and a memory map/page pin of the VFIO device has been completed a file descriptor can then be obtained for the device. This file descriptor can be used for interrupts (ioctls), to probe for information about the BAR regions, and configure the IRQs.
VFIO device file descriptor
VFIO device file descriptors are divided into regions and each region is mapped into a device resource. Region count and info (file offset, allowable access, ect..) can be discovered through interrupt (IOCTL). Each file descriptor region corresponding to a PCI resource is represented as a file offset.
In the case of RPC Mode this structure is emulated whereas in SR-IOV Mode the structure is mapped to a real PCI resource.
00:00.0 VGA compatible controller |
---|
Region 0 Bar0 (starts at offset 0) |
Region 1 Bar1 (MSI) |
Region 2 Bar2 (MSIX) |
Region 3 Bar3 |
Region 4 Bar4 |
Region 5 Bar5 (IO port space) |
Expansion ROM |
Below is what the file offsets looks like internally for each BAR region starting from address 0 and growing with the addition of former regions as you progress through the file.
<- File Offset -> | ||||
---|---|---|---|---|
0 -> A | A -> (A+B) | (A+B) -> (A+B+C) | (A+B+C) -> (A+B+C+D) | ... |
Region 0 (size A) | Region 1 (size B) | Region 2 (size C) | Region 3 (size D) | ... |
VFIO Interrupts
Guests communicate with the host via VFIO Interrupt Requests (IRQs). These are sent via an irqfd (IRQ File Descriptor). Similarly, the host receives these interrupts via eventfd (Event File Descriptor). The resulting data can be returned via a callback.
IRQs
Device properties discovered via interrupt (IOCTL).
Get Device Info
VFIO_DEVICE_GET_INFO | ||
---|---|---|
struct vfio_device_info | ||
argz | ||
flags | ||
VFIO_DEVICE_FLAGS_PCI | ||
VFIO_DEVICE_FLAGS_PLATFORM | ||
VFIO_DEVICE_FLAGS_RESET | ||
num_irqs | ||
num_regions |
The IRQ VFIO_DEVICE_GET_INFO
can provide information to distinguish between PCI and platform devices as well as the number of regions and IRQs for a particular device.
Get Region Info
VFIO_DEVICE_GET_REGION_INFO | ||
---|---|---|
struct vfio_region_info | ||
argz | ||
cap_offset | ||
flags | ||
VFIO_REGION_INFO_FLAG_CAPS | ||
VFIO_REGION_INFO_FLAG_MMAP | ||
VFIO_REGION_INFO_FLAG_READ | ||
VFIO_REGION_INFO_FLAG_WRITE | ||
index | ||
offset | ||
size |
Once the interrupt user knows the number of regions within a VFIO device they can use IRQ VFIO_DEVICE_GET_REGION_INFO
to probe each region for additional information. This interrupt will return information such as if it can be read from or written to, if the device supports MMAP, as well as what the offset and size of the region is within the VFIO file descriptor.
Get IRQ Info
VFIO_DEVICE_GET_IRQ_INFO | ||
---|---|---|
struct vfio_irq_info | ||
argz | ||
count | ||
flags | ||
VFIO_IRQ_INFO_AUTOMASKED | ||
VFIO_IRQ_INFO_EVENTFD | ||
VFIO_IRQ_INFO_MASKABLE | ||
VFIO_IRQ_INFO_NORESIZE | ||
index |
VFIO_IRQ_INFO_AUTOMASKED
is used to mask interrupts when they occur to protect the host.
Set IRQs
VFIO_DEVICE_SET_IRQS | ||
---|---|---|
struct vfio_irq_set | ||
argz | ||
count | ||
data[] | ||
flags | ||
VFIO_IRQ_SET_ACTION_MASK | ||
VFIO_IRQ_SET_ACTION_TRIGGER | ||
VFIO_IRQ_SET_ACTION_UNMASK | ||
VFIO_IRQ_SET_DATA_BOOL | ||
VFIO_IRQ_SET_DATA_EVENTFD | ||
VFIO_IRQ_SET_DATA_NONE | ||
index | ||
start |
RPC Mode
Instruction Execution
RPC Mode moves instruction information across a virtual function (VF) device using Remote Procedure Calls generally by way of soft interrupt (IOCTLs). These signals may be passed over file descriptors such as irqfd and ioeventfd. The Interrupt Request File Descriptor (irqfd) may be used to signal from the host into the guest whereas the I/O Event File Descriptor (ioeventfd) may be used to signal from the guest into the host. Guest GPU instructions passed from the guest as Remote Procedure Calls are Just-in-time recompiled on the host for execution by a device driver.
IRQ remapping
Interrupt Requests (IRQs) must be remapped (trapped for virtualized execution) to protect the host from sensitive instructions which may affect global memory state.
Memory Management
Region Passthrough
Guests may be presented with emulated memory regions which use indirect emulated communication requiring a VM-exit (slow) or instead the guest may be presented with passthrough memory regions which use direct communication requiring no VM-exit (fast).
EPT Page Violations
Guest Memory Mapped IO (MMIO) tripped Extended Page Table (EPT) violations which are trapped by the host MMU. KVM services EPT violations and forwards to QEMU VFIO PCI driver. QEMU then converts the request from KVM to R/W access to the Mdev File Descriptor (FD). Reads and writes are then handled by the host GPU device driver via mediated callbacks (CBs) and VFIO-mdev.
Scheduling
Scheduling is handled by the host mdev driver.
RPC Mode Requirements:
Sensitive Instruction List.
Instruction Shim/Binary Translator.
HPA<->GPA Boundary Enforcement.
SR-IOV Mode
Instruction Execution
SR-IOV Mode involves the communication of instructions from a virtual function (VF) through direct communication to the PCI BAR.
Memory Management
Guests are presenting with passthrough memory regions by the device firmware.
Scheduling
Scheduling may be handled by the host mdev driver and/or the device firmware.
SR-IOV Mode Requirements:
Device SR-IOV support.
HPA<->GPA Boundary Enforcement.
Talks & Reading Material
eventfd - root/virt/kvm/eventfd.c
(kernel diff:: KVM: irqfd)
(kernel diff:: KVM: add ioeventfd support)
VFIO - Virtual Function I/O - root/virt/kvm/vfio.c
[2016] An Introduction to PCI Device Assignment with VFIO by Alex Williamson
Intel GVT-g: From Production to Upstream - Zhi Wang, Intel
Hardware-Assisted Mediated Pass-Through with VFIO by Kevin Tian
[2016] vGPU on KVM - A VFIO Based Framework by Neo Jia & Kirti Wankhede
[2017] Generic Buffer Sharing Mechanism for Mediated Devices by Tina Zhang