Difference between revisions of "Virtual I/O Internals"

From Open-IOV
Jump to navigation Jump to search
Line 20: Line 20:
|Requires deferred instructions to be supported by host software (support libraries).
|Requires deferred instructions to be supported by host software (support libraries).
|Guest is ignorant of host supported software such as support libraries.
|Guest is ignorant of host supported software such as support libraries.
|}
== Both Modes ==
=== VFIO file descriptor ===
VFIO devices are mapped as file offsets to represent the IO device. 
In the case of a RPC Mode this structure is emulated whereas in SR-IOV Mode the structure is mapped to a real PCI resource.
{| class="wikitable"
|+BAR regions in a VGA PCI device.
!00:00.0 VGA compatible controller
|-
|Region 0 Bar0 (starts at offset 0)
|-
|Region 1 Bar1
|-
|Region 2 Bar2
|-
|Region 3 Bar3
|-
|Region 4 Bar4
|-
|Region 5 Bar5 (IO port space)
|-
|Expansion ROM
|}
{| class="wikitable"
|+VFIO representation of PCI BAR regions offsets.
! colspan="4" |<- File Offset ->
!
|-
!0 -> A
!A -> (A+B)
!(A+B) -> (A+B+C)
!(A+B+C) -> (A+B+C+D)
!...
|-
|Region 0 (size A)
|Region 1 (size B)
|Region 2 (size C)
|Region 3 (size D)
|...
|}
|}


Line 26: Line 69:
=== Instruction Execution ===
=== Instruction Execution ===
RPC Mode moves instruction information across a virtual function interface (VF) using [https://infogalactic.com/info/Remote_procedure_call Remote Procedure Calls] generally by way of [https://infogalactic.com/info/Interrupt soft interrupt] (IOCTLs). Guest GPU instructions passed from the guest as Remote Procedure Calls are [https://infogalactic.com/info/Just-in-time_compilation Just-in-time]  recompiled on the host for execution by a device driver.
RPC Mode moves instruction information across a virtual function interface (VF) using [https://infogalactic.com/info/Remote_procedure_call Remote Procedure Calls] generally by way of [https://infogalactic.com/info/Interrupt soft interrupt] (IOCTLs). Guest GPU instructions passed from the guest as Remote Procedure Calls are [https://infogalactic.com/info/Just-in-time_compilation Just-in-time]  recompiled on the host for execution by a device driver.
==== IRQ remapping ====
Interrupt Requests (IRQs) must be remapped (trapped for virtualized execution) to protect the host from sensitive instructions which may affect global memory state.


=== Memory Management ===
=== Memory Management ===
==== Region Passthrough ====
Guests may be presented with emulated memory regions which use indirect emulated communication requiring a VM-exit (slow) or instead the guest may be presented with passthrough memory regions which use direct communication requiring no VM-exit (fast).
Guests may be presented with emulated memory regions which use indirect emulated communication requiring a VM-exit (slow) or instead the guest may be presented with passthrough memory regions which use direct communication requiring no VM-exit (fast).
==== EPT Page Violations ====
Guest [https://infogalactic.com/info/Memory-mapped_I/O Memory Mapped IO (MMIO)] tripped Extended Page Table (EPT) violations which are trapped by the host MMU. KVM services EPT violations and forwards to QEMU VFIO PCI driver. QEMU then converts the request from KVM to R/W access to the [https://infogalactic.com/info/File_descriptor Mdev File Descriptor (FD)]. Reads and writes are then handled by the host GPU device driver via mediated [https://infogalactic.com/info/Callback_(computer_programming) callbacks (CBs)] and [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst VFIO-mdev].


=== Scheduling ===
=== Scheduling ===
Scheduling


Scheduling is handled by the host mdev driver.


'''RPC Mode Requirements:'''
'''RPC Mode Requirements:'''
Line 52: Line 102:


=== Scheduling ===
=== Scheduling ===
Scheduling may be handled by the host mdev driver and/or the device firmware.




Line 59: Line 111:


HPA<->GPA Boundary Enforcement.
HPA<->GPA Boundary Enforcement.
== Both Modes ==
=== VFIO file descriptor ===
VFIO devices are mapped based on a file offset representing the virtual PCI device.
{| class="wikitable"
|+BAR regions in a VGA PCI device.
!00:00.0 VGA compatible controller
|-
|Region 0 Bar0 (starts at offset 0)
|-
|Region 1 Bar1
|-
|Region 2 Bar2
|-
|Region 3 Bar3
|-
|Region 4 Bar4
|-
|Region 5 Bar5 (IO port space)
|-
|Expansion ROM
|}
{| class="wikitable"
|+VFIO representation of PCI BAR regions offsets.
! colspan="4" |<- File Offset ->
!
|-
!0 -> A
!A -> A+B
!A+B -> A+B+C
!A+B+C -> A+B+C+D
!...
|-
|Region 0 (A)
|Region 1 (B)
|Region 2 (C)
|Region 3 (D)
|...
|}
=== IRQ remapping ===

Revision as of 21:41, 24 April 2022

The following document will attempt to detail the internals of a Virtual Function IO (VFIO) driven Mediated Device (Mdev).

Comparison of Approaches
RPC Mode SR-IOV Mode
Host requires insight about guest of workload. Host ignorance of guest workload.
Error reporting. No guest driver error reporting.
In depth dynamic monitoring. Basic dynamic monitoring.
Software defined MMU guest separation. Firmware defined MMU guest separation.
Requires deferred instructions to be supported by host software (support libraries). Guest is ignorant of host supported software such as support libraries.

Both Modes

VFIO file descriptor

VFIO devices are mapped as file offsets to represent the IO device.

In the case of a RPC Mode this structure is emulated whereas in SR-IOV Mode the structure is mapped to a real PCI resource.

BAR regions in a VGA PCI device.
00:00.0 VGA compatible controller
Region 0 Bar0 (starts at offset 0)
Region 1 Bar1
Region 2 Bar2
Region 3 Bar3
Region 4 Bar4
Region 5 Bar5 (IO port space)
Expansion ROM
VFIO representation of PCI BAR regions offsets.
<- File Offset ->
0 -> A A -> (A+B) (A+B) -> (A+B+C) (A+B+C) -> (A+B+C+D) ...
Region 0 (size A) Region 1 (size B) Region 2 (size C) Region 3 (size D) ...

RPC Mode

Instruction Execution

RPC Mode moves instruction information across a virtual function interface (VF) using Remote Procedure Calls generally by way of soft interrupt (IOCTLs). Guest GPU instructions passed from the guest as Remote Procedure Calls are Just-in-time recompiled on the host for execution by a device driver.

IRQ remapping

Interrupt Requests (IRQs) must be remapped (trapped for virtualized execution) to protect the host from sensitive instructions which may affect global memory state.

Memory Management

Region Passthrough

Guests may be presented with emulated memory regions which use indirect emulated communication requiring a VM-exit (slow) or instead the guest may be presented with passthrough memory regions which use direct communication requiring no VM-exit (fast).

EPT Page Violations

Guest Memory Mapped IO (MMIO) tripped Extended Page Table (EPT) violations which are trapped by the host MMU. KVM services EPT violations and forwards to QEMU VFIO PCI driver. QEMU then converts the request from KVM to R/W access to the Mdev File Descriptor (FD). Reads and writes are then handled by the host GPU device driver via mediated callbacks (CBs) and VFIO-mdev.

Scheduling

Scheduling is handled by the host mdev driver.

RPC Mode Requirements:

Sensitive Instruction List.

Instruction Shim/Binary Translator.

HPA<->GPA Boundary Enforcement.

SR-IOV Mode

Instruction Execution

SR-IOV Mode involves the communication of instructions from a virtual function (VF) through direct communication to the PCI BAR.

Memory Management

Guests are presenting with passthrough memory regions by the device firmware.

Scheduling

Scheduling may be handled by the host mdev driver and/or the device firmware.


RPC Mode Requirements:

Device SR-IOV support.

HPA<->GPA Boundary Enforcement.