Open-IOV - User contributions [en]

Articles

2024-04-19T16:27:15Z

Arthur:

This page indexes the articles contained within Open-IOV.

If you're new to GPU Virtualization start by reading the '''[[Introduction]]''' article.
=== Start Here ===
[[Introduction]]

[https://open-iov.org/index.php/Open-IOV:About About Open-IOV (CC-BY-4.0)]

===Abstract===
[[Introductory Concepts & Definitions|Glossary]]

[[Virtualization Fundamentals]]

[[Merged Drivers]]

=== Design Documents ===
[[Virtual IO Internals|Virtual I/O Internals]]

[[GPU Driver Internals]]
=== Driver Integration Documents ===
[https://open-iov.org/index.php/OpenRM Nvidia]

[[Intel SR-IOV APIs|Intel]]

[https://open-iov.org/index.php/AMDGPU AMD]

===Projects===
[https://open-iov.org/index.php/LIME_Is_Mediated_Emulation LIME Is Mediated Emulation]

[https://open-iov.org/index.php/Looking_Glass_KVMFR Looking Glass]

[https://openxt.atlassian.net/wiki/spaces/OD/pages/10747915/What+is+OpenXT OpenXT]

[https://gitlab.com/vglass OpenXT: vGlass]

[https://github.com/OpenXT/surfman OpenXT: Surfman (legacy DRM)]

[https://www.bromium.com/opensource/ Bromium/uXen]

[https://xenproject.org/help/documentation/ Xen Project]

[https://www.qubes-os.org/doc/ Qubes OS]

[https://projectacrn.github.io/2.1/tutorials/using_celadon_as_uos.html Intel Celadon]

[https://open-iov.org/index.php/VGPU_Unlock vGPU_Unlock]
=== Device Support===
[[GPU Support]]

[[CPU Support]]

[[GPU Firmware]]

=== Software Support ===
[https://open-iov.org/index.php/Hypervisor_Support Hypervisor Support]

[[GPU Software Bill Of Materials (SBOM)]]

=== API Documentation ===

==== Kernel APIs ====
[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api Kernel.org Driver Core Documentation]

[https://docs.microsoft.com/en-us/windows-hardware/drivers/display/iommu-based-gpu-isolation NT Kernel (Windows) IOMMU-based GPU Isolation]

[https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/vfio.rst VFIO] - [https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h vfio.h] - [https://elixir.bootlin.com/linux/latest/source/include/linux/mdev.h mdev.h]

[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 VFIO Mediated Device]
==== Driver APIs ====
[https://projectacrn.github.io/2.1/api/GVT-g_api.html i915 GVT-g API]

[https://nouveau.freedesktop.org/Development.html Nouveau Tools & API]
==== Sample Code ====
GPLv2 sources mirrored from [https://elixir.bootlin.com/linux/latest/source/samples/vfio-mdev/ elixir.bootlin.com] with [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/Makefile simple makefile changes].

[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mtty.c mtty.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy.c mdpy.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-fb.c mdpy-fb.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-defs.h mdpy-defs.h] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mbochs.c mbochs.c]

==== Virtualization APIs ====
[https://open-iov.org/index.php/Mdev-GPU#Mdev-CLI GVM/Mdev-CLI API]

[https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html QEMU Machine Protocol (QMP) Reference Manual]

[https://projectacrn.github.io/2.1/developer-guides/hld/ivshmem-hld.html Inter-VM Shared Memory (IVSHMEM)]
===User Guides===
[https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/ LibVF.IO Setup Guide]

[https://looking-glass.io/docs/stable/ Looking Glass Quickstart Guide]

[https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide Intel GVT-g Setup Guide]

[https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/tree/master/docs AMD GPU-IOV Module Docs]

[https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF PCI passthrough via OVMF]

[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/index RedHat Virtualization Guide]

=== Developer Guides ===
[https://rayanfam.com/tags/hypervisor/ Hypervisor From Scratch]

[https://lwn.net/Kernel/LDD3/ Linux Device Drivers (3rd Edition)]

[https://dri.freedesktop.org/docs/drm/gpu/ GPU Driver Developer's Guide]

[https://dri.freedesktop.org/docs/drm/PCI/pci.html# How To Write PCI Drivers]

[https://doc.dpdk.org/guides-16.04/prog_guide/ivshmem_lib.html Data Plane Development Kit: IVSHMEM Programming Guide]

=== Specifications ===
[https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs Hyper-V Hypervisor Top Level Functional Specification (TLFS)]

=== Communities & Mailing Lists ===
[https://discord.gg/Rb9K9DYxKK Open-IOV Discord]

[https://lists.freedesktop.org/mailman/listinfo/intel-gfx Intel-gfx Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/nouveau Nouveau Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/amd-gfx AMD-gfx Mailing List]

[https://listman.redhat.com/mailman/listinfo/vfio-users VFIO-users Mailing List]

[https://forum.level1techs.com/c/software/vfio/132 <nowiki>Level1Techs Forum [VFIO Topic]</nowiki>]

[https://old.reddit.com/r/VFIO/ VFIO Subreddit]

Articles

2024-04-19T16:16:14Z

Arthur:

This page indexes the articles contained within Open-IOV.

If you're new to GPU Virtualization start by reading the '''[[Introduction]]''' article.
=== Start Here ===
[[Introduction]]

[https://open-iov.org/index.php/Open-IOV:About About Open-IOV (CC-BY-4.0)]

===Abstract===
[[Introductory Concepts & Definitions|Glossary]]

[[Virtualization Fundamentals]]

[[Merged Drivers]]

=== Design Documents ===
[[Virtual IO Internals|Virtual I/O Internals]]

[[GPU Driver Internals]]
=== Driver Integration Documents ===
[https://open-iov.org/index.php/OpenRM Nvidia]

[[Intel SR-IOV APIs|Intel]]

[https://open-iov.org/index.php/AMDGPU AMD]

===Projects===
[https://open-iov.org/index.php/LIME_Is_Mediated_Emulation LIME Is Mediated Emulation]

[https://open-iov.org/index.php/Looking_Glass_KVMFR Looking Glass]

[https://openxt.atlassian.net/wiki/spaces/OD/pages/10747915/What+is+OpenXT OpenXT]

[https://gitlab.com/vglass OpenXT: vGlass]

[https://github.com/OpenXT/surfman OpenXT: Surfman (legacy DRM)]

[https://www.bromium.com/opensource/ Bromium/uXen]

[https://xenproject.org/help/documentation/ Xen Project]

[https://www.qubes-os.org/doc/ Qubes OS]

[https://projectacrn.github.io/2.1/tutorials/using_celadon_as_uos.html Intel Celadon]

[https://open-iov.org/index.php/VGPU_Unlock vGPU_Unlock]

[[LibRM]]
=== Device Support===
[[GPU Support]]

[[CPU Support]]

[[GPU Firmware]]

=== Software Support ===
[https://open-iov.org/index.php/Hypervisor_Support Hypervisor Support]

[[GPU Software Bill Of Materials (SBOM)]]

=== API Documentation ===

==== Kernel APIs ====
[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api Kernel.org Driver Core Documentation]

[https://docs.microsoft.com/en-us/windows-hardware/drivers/display/iommu-based-gpu-isolation NT Kernel (Windows) IOMMU-based GPU Isolation]

[https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/vfio.rst VFIO] - [https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h vfio.h] - [https://elixir.bootlin.com/linux/latest/source/include/linux/mdev.h mdev.h]

[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 VFIO Mediated Device]
==== Driver APIs ====
[https://projectacrn.github.io/2.1/api/GVT-g_api.html i915 GVT-g API]

[https://nouveau.freedesktop.org/Development.html Nouveau Tools & API]
==== Sample Code ====
GPLv2 sources mirrored from [https://elixir.bootlin.com/linux/latest/source/samples/vfio-mdev/ elixir.bootlin.com] with [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/Makefile simple makefile changes].

[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mtty.c mtty.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy.c mdpy.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-fb.c mdpy-fb.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-defs.h mdpy-defs.h] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mbochs.c mbochs.c]

==== Virtualization APIs ====
[https://open-iov.org/index.php/Mdev-GPU#Mdev-CLI GVM/Mdev-CLI API]

[https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html QEMU Machine Protocol (QMP) Reference Manual]

[https://projectacrn.github.io/2.1/developer-guides/hld/ivshmem-hld.html Inter-VM Shared Memory (IVSHMEM)]
===User Guides===
[https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/ LibVF.IO Setup Guide]

[https://looking-glass.io/docs/stable/ Looking Glass Quickstart Guide]

[https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide Intel GVT-g Setup Guide]

[https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/tree/master/docs AMD GPU-IOV Module Docs]

[https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF PCI passthrough via OVMF]

[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/index RedHat Virtualization Guide]

=== Developer Guides ===
[https://rayanfam.com/tags/hypervisor/ Hypervisor From Scratch]

[https://lwn.net/Kernel/LDD3/ Linux Device Drivers (3rd Edition)]

[https://dri.freedesktop.org/docs/drm/gpu/ GPU Driver Developer's Guide]

[https://dri.freedesktop.org/docs/drm/PCI/pci.html# How To Write PCI Drivers]

[https://doc.dpdk.org/guides-16.04/prog_guide/ivshmem_lib.html Data Plane Development Kit: IVSHMEM Programming Guide]

=== Specifications ===
[https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs Hyper-V Hypervisor Top Level Functional Specification (TLFS)]

=== Communities & Mailing Lists ===
[https://discord.gg/Rb9K9DYxKK Open-IOV Discord]

[https://lists.freedesktop.org/mailman/listinfo/intel-gfx Intel-gfx Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/nouveau Nouveau Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/amd-gfx AMD-gfx Mailing List]

[https://listman.redhat.com/mailman/listinfo/vfio-users VFIO-users Mailing List]

[https://forum.level1techs.com/c/software/vfio/132 <nowiki>Level1Techs Forum [VFIO Topic]</nowiki>]

[https://old.reddit.com/r/VFIO/ VFIO Subreddit]

Virtual I/O Internals

2023-06-19T17:54:32Z

Arthur: /* References (Talks & Reading Material) */

The following document details the internals of a '''VFIO (Virtual Function I/O)''' driven '''Shared''' '''I/O Device.'''<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

This article places emphasis on the '''Virtual GPU (vGPU)''' use case however these concepts apply generically to virtualization of I/O devices (TPUs, NICs, storage peripherals, ect..).
{| class="wikitable"
|+Comparison of Assistance Modes
!Mdev Mode
!SR-IOV Mode
!SIOV Mode
|-
|No hardware assistance needed.
|Hardware assistance needed.
|Hardware assistance needed.
|-
|Host requires insight about guest workload.
|Host ignorance of guest workload.
|Host requires insight about guest workload.
|-
|Error reporting.
|No guest driver error reporting.
|Error reporting.
|-
|In depth dynamic monitoring.
|Basic dynamic monitoring.
|In depth dynamic monitoring.
|-
|Software defined MMU guest separation.
|Firmware defined MMU guest separation.
|Firmware defined MMU guest separation.
|-
|Requires deferred instructions to be supported by host software (support libraries).
|Guest is ignorant of host supported software such as support libraries.
|
|-
|Routing interrupts.
|Routing interrupts.
|Routing interrupts.
|-
|Device reset.
|Device reset.
|Device reset.
|-
|Enable/Disable device.
|Enable/Disable device.
|Enable/Disable device.
|-
|Support for multiple scheduling techniques.
|Support for multiple scheduling techniques.
|Support for multiple scheduling techniques.
|-
|Single PCI requester ID.
|Multiple PCI requester IDs.
|Multiple PCI requester IDs.
|}

== All Modes ==
This section will cover concepts which apply both to [https://open-iov.org/index.php/Mediated_Device_Internals#Mdev_Mode Mdev Mode], [https://open-iov.org/index.php/Mediated_Device_Internals#SR-IOV_Mode SR-IOV Mode] & [https://open-iov.org/index.php/Mediated_Device_Internals#SIOV_Mode SIOV Mode].

=== Knowledge Resources Used ===
This section is supported by significant contributions in open source by Alex Williamson.

See references 2, 14, & 22 in the [https://open-iov.org/index.php/Virtual_IO_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

===Binding VFIO devices===
[[File:Vfio-pci driver bindings.png|thumb|'''Figure 1:''' VFIO group nodes are unit of ownership that VFIO uses. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]
[[File:IOCTL set VFIO container.png|thumb|'''Figure 2:''' IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER) places the VFIO Group inside the VFIO Container. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]
[[File:Ioctl-VFIO IOMMU MAP-UNMAP DMA.png|thumb|'''Figure 3:''' Using interrupts IOCTL(CONTAINER, VFIO_IOMMU_MAP_DMA, &MAP), IOCTL(CONTAINER, VFIO_IOMMU_UNMAP_DMA, &UNMAP) to map memory and pin pages. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]
[[File:Ioctl-VFIO GROUP GET FD.png|thumb|'''Figure 4:''' Using interrupt IOCTL(GROUP2, VFIO_GROUP_GET_FD, "0000:01:00.0") to obtain the VFIO Group file descriptor. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]

'''Figure 1:''' Binding devices to the vfio-pci driver results in VFIO group nodes.

Opening the file "/dev/vfio/vfio" creates a VFIO Container.

'''Figure 2:''' The interrupt routine '''<code>IOCTL(GROUP, VFIO_GROUP_SET_CONTAINER, &CONTAINER)</code>''' places the VFIO group inside the VFIO container.

===Programming the IOMMU===
When this has been done '''<code>IOCTL(CONTAINER, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)</code>''' can then be used to set an IOMMU type for the container which places it in a user interactable state.

Once this IOMMU type state has been set and the VFIO container has been made interactable additional VFIO groups may be added to the container without requiring that the group's IOMMU type be set again as newly added groups automatically inherit the container's IOMMU context.

===VFIO Memory Mapped IO===
'''Figure 3:''' Once the VFIO Groups have been placed inside the VFIO container and the IOMMU type has been set the user may then map and unmap which will automatically inserts Memory Mapped IO (MMIO) entries into the IOMMU as well as pin/unpin pages as necessary. This can be accomplished using '''<code>IOCTL(CONTAINER, VFIO_IOMMU_MAP_DMA, &MAP)</code>''' for map/pin and '''<code>IOCTL(CONTAINER, VFIO_IOMMU_UNMAP_DMA, &UNMAP)</code>''' for unmap/unpin.

=== Getting the VFIO Group File Descriptor ===
'''Figure 4:''' Once the device has been bound to a VFIO driver, set in a VFIO container, the VFIO container has it's IOMMU type set, and a memory map/page pin of the VFIO device has been completed a file descriptor can then be obtained for the device. This file descriptor can be used for interrupts (ioctls), to probe for information about the BAR regions, and configure the IRQs.
===VFIO device file descriptor ===
VFIO device file descriptors are divided into regions and each region is mapped into a device resource. Region count and info (file offset, allowable access, ect..) can be discovered through interrupt (IOCTL). Each file descriptor region corresponding to a PCI resource is represented as a file offset.

In the case of RPC Mode this structure is emulated whereas in SR-IOV Mode the structure is mapped to a real PCI resource.

{| class="wikitable"
|+ BAR regions in a VGA PCI device.
!00:00.0 VGA compatible controller
|-
|Region 0 Bar0 (Config Space) starts at offset 0
|-
|Region 1 Bar1 (MSI - Message Signaled Interrupts)
|-
|Region 2 Bar2 (MSIX)
|-
|Region 3 Bar3
|-
|Region 4 Bar4
|-
|Region 5 Bar5 (IO port space)
|-
|Expansion ROM
|}
Below is what the file offsets looks like internally for each BAR region starting from address 0 and growing with the addition of former regions as you progress through the file.
{| class="wikitable"
|+VFIO representation of PCI BAR regions offsets.
! colspan="5" |<- File Offset ->
|-
!0 -> A
! A -> (A+B)
!(A+B) -> (A+B+C)
!(A+B+C) -> (A+B+C+D)
!...
|-
|Region 0 (size A)
|Region 1 (size B)
|Region 2 (size C)
|Region 3 (size D)
|...
|}

===VFIO Interrupts===
Guests communicate with the host via VFIO Interrupt Requests ([https://infogalactic.com/info/Interrupt_request_(PC_architecture) IRQs]). These are sent via an irqfd (IRQ [https://infogalactic.com/info/File_descriptor File Descriptor]). Similarly, the host receives these interrupts via [https://man7.org/linux/man-pages/man2/eventfd.2.html eventfd] (Event File Descriptor). The resulting data can be returned via a [https://infogalactic.com/info/Callback_(computer_programming) callback].

====IRQs====
Device properties discovered via interrupt (IOCTL).

=====Get Device Info=====
{| class="wikitable"
|+
! colspan="3" |VFIO_DEVICE_GET_INFO
|-
| colspan="3" |struct vfio_device_info
|-
| rowspan="7" |
|argz
|
|-
|flags
|
|-
| rowspan="3" |
|VFIO_DEVICE_FLAGS_PCI
|-
|VFIO_DEVICE_FLAGS_PLATFORM
|-
|VFIO_DEVICE_FLAGS_RESET
|-
|num_irqs
|
|-
|num_regions
|
|}
The IRQ '''<code>VFIO_DEVICE_GET_INFO</code>''' can provide information to distinguish between PCI and platform devices as well as the number of regions and IRQs for a particular device.

The upstream API can be read '''[https://github.com/torvalds/linux/blob/47700948a4abb4a5ae13ef943ff682a7f327547a/include/uapi/linux/vfio.h#L194 here (vfio.h)]'''.

'''Sample mdev code''' to service this request can be found '''[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L470 here]''', and '''[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L522 here]'''.

=====Get Region Info=====
{| class="wikitable"
|+
! colspan="3" |VFIO_DEVICE_GET_REGION_INFO
|-
| colspan="3" |struct vfio_region_info
|-
| rowspan="10" |
|argz
|
|-
|cap_offset
|
|-
|flags
|
|-
| rowspan="4" |
|VFIO_REGION_INFO_FLAG_CAPS
|-
|VFIO_REGION_INFO_FLAG_MMAP
|-
|VFIO_REGION_INFO_FLAG_READ
|-
|VFIO_REGION_INFO_FLAG_WRITE
|-
| index
|
|-
|offset
|
|-
|size
|
|}
Once the interrupt user knows the number of regions within a VFIO device they can use IRQ '''<code>VFIO_DEVICE_GET_REGION_INFO</code>''' to probe each region for additional information. This interrupt will return information such as if it can be read from or written to, if the device supports MMAP, as well as what the offset and size of the region is within the VFIO file descriptor.

The upstream API can be read [https://github.com/torvalds/linux/blob/47700948a4abb4a5ae13ef943ff682a7f327547a/include/uapi/linux/vfio.h#L240 '''here (vfio.h)'''].

'''Sample mdev code''' to service this request can be found [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L425 '''here'''], and [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L545 '''here'''].

===== Get IRQ Info=====
{| class="wikitable"
|+
! colspan="3" | VFIO_DEVICE_GET_IRQ_INFO
|-
| colspan="3" |struct vfio_irq_info
|-
| rowspan="8" |
| argz
|
|-
|count
|
|-
| flags
|
|-
| rowspan="4" |
|VFIO_IRQ_INFO_AUTOMASKED
|-
|VFIO_IRQ_INFO_EVENTFD
|-
| VFIO_IRQ_INFO_MASKABLE
|-
|VFIO_IRQ_INFO_NORESIZE
|-
|index
|
|}
<code>'''VFIO_DEVICE_GET_IRQ_INFO'''</code> is used to retrieve information about a device IRQ.

'''<code>VFIO_IRQ_INFO_AUTOMASKED</code>''' is used to mask interrupts when they occur to protect the host.

The upstream API can be read '''[https://github.com/torvalds/linux/blob/47700948a4abb4a5ae13ef943ff682a7f327547a/include/uapi/linux/vfio.h#L480 here (vfio.h)]'''

'''Sample mdev code''' to service this request can be found [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L463 '''here'''], and [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L570 '''here'''].

=====Set IRQs=====
{| class="wikitable"
|+
! colspan="3" |VFIO_DEVICE_SET_IRQS
|-
| colspan="3" |struct vfio_irq_set
|-
| rowspan="12" |
|argz
|
|-
|count
|
|-
|data[]
|
|-
|flags
|
|-
| rowspan="6" |
|VFIO_IRQ_SET_ACTION_MASK
|-
|VFIO_IRQ_SET_ACTION_TRIGGER
|-
|VFIO_IRQ_SET_ACTION_UNMASK
|-
|VFIO_IRQ_SET_DATA_BOOL
|-
|VFIO_IRQ_SET_DATA_EVENTFD
|-
|VFIO_IRQ_SET_DATA_NONE
|-
|index
|
|-
|start
|
|}
<code>'''VFIO_DEVICE_SET_IRQS'''</code> is used to setup IRQs. Actions can be configured such as trigger which is when the device triggers an interrupt (IOCTL), masking and unmasking actions can be set. Bool and None data types are used for loopback testing of the device. Start and index may be used to modify subregions.

The upstream API can be read '''[https://github.com/torvalds/linux/blob/47700948a4abb4a5ae13ef943ff682a7f327547a/include/uapi/linux/vfio.h#L524 here (vfio.h)]'''.

=== Device Decomposure/Recomposure ===
[[File:Device Decomposure and Recomposure via VFIO.png|alt=Figure 5: Device Decomposure and Recomposure via VFIO.|thumb|'''Figure 5:''' Device Decomposure and Recomposure via VFIO. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]
'''Figure 5:''' Virtual Function IO (VFIO) devices are deconstructed in userspace into a set of VFIO primitives (MMIO pages, VFIO/IOMMU Groups, VFIO IRQs, File Descriptors). Recomposure of these devices occurs upon assignment of a Virtual Function (VF) to a QEMU virtual machine.

=== Memory Management Unit (MMU) ===
[[File:Figure 6- MMU - GMMU IOVA translation-isolation..png|alt=Figure 6: MMU - GMMU IOVA translation/isolation.|thumb|'''Figure 6:''' MMU - GMMU IOVA translation/isolation.]]
'''Figure 6:''' This section will touch upon the mechanisms used for enforcement of Host Physical Address (HPA) to Guest Physical Address (GPA) isolation.

==== MMIO Isolation (Platform MMU) ====
The platform's CPU communicates with the GPU by reading/writing to and from pinned MMIO pages in [https://infogalactic.com/info/Random-access_memory Random Access Memory (RAM)]. MMIO pages within the RAM are subject to IO Virtual Address (IOVA) translations by the platform's discrete MMU controller which is programmed by the CPU. These IOVA translations serve as a mechanism to enforce HPA to GPA isolation in the context of the platform.

==== VRAM Isolation (GPU GMMU) ====
The GPU core performs virtualized operations by reading/writing to and from shadow page tables in onboard [[wikipedia:Video_random_access_memory|Video Random Access Memory (VRAM)]]. Shadow pages within the VRAM are subject to IO Virtual Address (IOVA) translations by the GPU's discrete GPU MMU controller (GMMU) which is programmed by the Embedded CPU (GPU co-processor). These IOVA translations serve as a mechanism to enforce HPA to GPA isolation in the context of the virtual GPUs and multi-process isolation in single user environments.

==== Platform MMIO <-> GPU Shadow Pages ====
In the context of VFIO pinned MMIO pages in RAM act as an interface to communicate with VRAM shadow pages allowing GPU drivers on the platform to send instructions to the GPU. When the GPU or Platform alters memory contained in a shadow page or pinned MMIO page the change is mirrored in the corresponding IO Virtual Address (IOVA). For example if shadow page 0 is changed by the GPU this change is mirrored in MMIO page 0 on the platform (the reverse example also applies). When communications occur between the platform and GPU the information first moves through the MMU/GMMU and is then written to RAM/VRAM.
[[File:Figure 7- A depiction of region overlays within a VFIO file descriptor..png|alt=Figure 7: A depiction of region overlays within a VFIO file descriptor.|thumb|'''Figure 7:''' A depiction of region overlays within a VFIO file descriptor. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]

=== VFIO Quirks (region traps) ===
'''Figure 7:''' Regions of the PCI BAR require emulation (slow path) via emulated traps (known as quirks). These regions are primarily in PCI configuration space. QEMU may overlap a region overlay which when read/written to/from triggers a VM-exit to trap and emulate the region in order that appropriate translations may occur (such as those concerned with IO Virtual Address - IOVA).
[[File:Screen Shot 2022-05-06 at 10.46.59 AM.png|alt=Peer-to-Peer DMA Isolation under IOMMU|thumb|'''Figure 8:''' Peer-to-Peer DMA Isolation under IOMMU. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[2]</nowiki>]

]]
=== Known IOMMU Issues ===
'''DMA Aliasing'''

* Not all devices generate unique IDs.
* Not all devices generate IDs they should. 

'''DMA Isolation'''

* '''Figure 8:''' Peer-to-Peer DMA Isolation. In many circumstances IO Virtual Address (IOVA) translations do not occur properly in the context of DMA peering. Transactions that occur through the IOMMU are unaffected.

[[File:Figure 0- Approches to GPU Virtualization..png|alt=Figure 0: Approches to GPU Virtualization.|thumb|'''Figure 0:''' Approaches to GPU Virtualization. See slides from: [https://open-iov.org/index.php/Virtual_I/O_Internals#References_(Talks_&_Reading_Material) <nowiki>[61]</nowiki>]]]

==Mdev Mode==
'''Mdev Mode (VFIO Mediated Device)''' is a method of virtualizing I/O devices enabling full API capabilities without the requirement for hardware assistance.

=== Knowledge Resources Used ===
This section is supported by significant contributions in open source by Neo Jia, Kirti Wankhede, Kevin Tian, Yiying Zhang, David Cowperthwaite, Kun Tian, Yaozu Dong, Tina Zhang, Gerd Hoffmann, & Zhi Wang.

See references 1, 7, 8, 17, 18, 19, 70 in the [https://open-iov.org/index.php/Virtual_IO_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

=== Mediated Core ===
The mediated device framework made 3 major changes to the VFIO driver.

* '''1: [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/vfio/mdev/mdev_core.c Mediated core module (new)]''' -Mediated bus driver, create mediated device. -Physical device interface for vendor callbacks. -Generic Mediated device management user interface (sysfs).
* '''2: [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/vfio/mdev/mdev_driver.c Mediated device module (new)]''' -Manage created mediated device, fully compatible with VFIO user API (UAPI).
* '''3: [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/vfio/vfio_iommu_type1.c VFIO IOMMU driver (enhancement)]''' -VFIO IOMMU API Type1 compatible, easy to extend to non-Type1.
The full list of these changes can be seen in the [https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg03922.html '''lists.gnu.org Qemu-devel''' mailing list archive].

=== Device Initialization ===
This section will deal with how an Mdev driver is initialized.

'''Sample mdev code''' for device initialization can be found '''[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L229 here]'''.

==== Registering VFIO MDEV as a driver ====
VFIO Mdev must be registered as a driver. This register event occurs between the Mdev Driver Register Interface and the VFIO MDEV interface.

==== Registering PCIE Device with the Mediated Core ====
[[File:Mdev-gpu.png|alt=Registering the mediated device.|thumb|'''Figure 9:''' Registering the mediated device. This step may be accomplished via the vendor driver or via [https://docs.linux-gvm.org/ GVM/Mdev-GPU] for drivers which do not include support for Mdev functionality and/or do not support arbitrary Mdev types.]]
'''Figure 9:''' The vendor driver must register with the Mediated Core's Device Register Interface. The example shown uses the [https://docs.linux-gvm.org/ GVM Project]'s [https://docs.linux-gvm.org/mdev-gpu/ Mdev-GPU] component to accomplish this step on behalf of the vendor driver.

==== Registering Mediated Callbacks (CBs) ====
'''Figure 9:''' The vendor driver must now register Mediated Callbacks which it expects to receive from Mdev devices. The example shown uses the [https://docs.linux-gvm.org/ GVM Project]'s [https://docs.linux-gvm.org/mdev-gpu/ Mdev-GPU] component to accomplish this step on behalf of the vendor driver.

==== Creating a Mediated Device via mdev-sysfsdev API ====

[[File:Echoing a UUID into the sysfsdev API.png|thumb|'''Figure 10:''' Echoing a UUID into the mdev-sysfsdev API. This functionality is included in [https://github.com/mdevctl/mdevctl mdevctl] and in [https://libvf.io/ LibVF.IO]. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]

]]
'''Figure 10:''' The user of an mdev capable device driver may echo values such as a UUID into the mdev-sysfsdev interface to create a unique mediated device. UUIDs must be unique per mdev device within a host.
[[File:QEMU creates VFIO-mdev IOMMU binding and acquires mdev file descriptor..png|alt=Figure 10: QEMU creates VFIO-mdev IOMMU binding and acquires mdev file descriptor.|thumb|'''Figure 11:''' QEMU creates VFIO-mdev IOMMU binding and acquires mdev file descriptor. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]

]]

==== QEMU adds VFIO device to IOMMU container-group ====
'''Figure 11:''' When starting a QEMU process with a VFIO-mdev attached QEMU calls the VFIO API to add the VFIO device to an IOMMU container/group. QEMU then runs the IOCTL to obtain a file descriptor for the device.

==== QEMU passes mdev device file descriptor to VM ====
[[File:QEMU presenting the VFIO file descriptor into the virtual machine..png|alt=Figure 11: QEMU presenting the VFIO file descriptor into the virtual machine.|thumb|'''Figure 12:''' QEMU presenting the VFIO file descriptor into the virtual machine. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]

]]
'''Figure 12:''' Once QEMU has obtained the VFIO file descriptor for the Mdev device via IOCTL it is then QEMU's job to present the file descriptor into the virtual machine so that the mdev may be used by the guest.

===Instruction Execution===
[[File:Ioeventfd-and-irqfd.png|thumb| '''Figure 13:''' A simple diagram of signalling from host to guest (via irqfd) & guest to host (via ioeventfd) From [http://blog.allenx.org/2015/07/05/kvm-irqfd-and-ioeventfd blog.allenx]]]
'''Figure 13:''' Mdev Mode moves instruction information across a virtual function (VF) device using [https://infogalactic.com/info/Remote_procedure_call Remote Procedure Calls] generally by way of [https://infogalactic.com/info/Interrupt software interrupt] ([https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL]). Signals to and from the guest and the host GPU driver may be passed over file descriptors such as the [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=721eecbf4fe995ca94a9edec0c9843b1cc0eaaf3 Interrupt Request File Descriptor (irqfd)] and [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d34e6b175e61821026893ec5298cc8e7558df43a IO Event File Descriptor (ioeventfd)].
The irqfd may be used to signal from the host into the guest whereas the ioeventfd may be used to signal from the guest into the host. Guest GPU instructions which would normally serialize as pRPCs (physical Remote Procedure Calls) are instead serialized from the guest as vRPCs (virtual Remote Procedure Calls) which are executed by the host mediated driver.

====IRQ remapping====
Interrupt Requests (IRQs) must be remapped (trapped for virtualized execution) to protect the host from sensitive instructions which may affect global memory state.

==== Interrupt Injection ====
'''--'''

'''--Researching--'''

'''--'''

===Memory Management ===
Mdev memory management is handled by vendor driver software.

====Region Passthrough====
Guests may be presented with emulated memory regions or via passthrough regions or a mixture of the two such as in the case of passthrough regions with BAR 0 configuration space emulation while other regions are passthrough'd.

===== Emulated Regions =====
Emulated memory regions use indirect emulated communication requiring a VM-exit (slow). These regions are often used for virtual PCI config space such as [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/c0adb9957c0b22bfdbc18395e81dd2b476addac5/mdpy.c#L112 in this sample code].

===== Passthrough Regions =====
Passthrough memory regions use direct communication requiring no VM-exit (fast).

===== Region Access =====
[[File:QEMU gets region info via VFIO UAPI from vendor driver through vfio-mdev and Mediated CBs.png|alt=Figure 13: QEMU gets region info via VFIO UAPI from vendor driver through vfio-mdev and Mediated CBs|thumb|'''Figure 14:''' QEMU gets region info via VFIO UAPI from vendor driver through vfio-mdev and Mediated CBs. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]
'''Figure 14:''' QEMU gets region information via VFIO User API (UAPI) from the vendor driver through VFIO-mdev and Mediated Callbacks.
[[File:Guest driver accessing Mdev MMIO space backed by mdev file descriptor triggers EPT violation..png|alt=Figure 14: Guest driver accessing Mdev MMIO space backed by mdev file descriptor triggers EPT violation.|thumb|'''Figure 15:''' Guest driver accessing Mdev MMIO space backed by mdev file descriptor triggers EPT violation. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]
'''Figure 15:''' The guest's vendor driver accesses the Mdev MMIO trapped region backed by a mdev file descriptor (fd) which triggers an Extended Page Table (EPT) violation.
[[File:KVM services EPT violation and forwards to QEMU VFIO PCI driver..png|alt=Figure 15: KVM services EPT violation and forwards to QEMU VFIO PCI driver.|thumb|'''Figure 16:''' KVM services EPT violation and forwards to QEMU VFIO PCI driver. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]
'''Figure 16:''' KVM services EPT violation and forwards to QEMU VFIO PCI driver.
[[File:QEMU convert request from KVM to Read-Write acess to Mdev file descriptor..png|alt=Figure 16: QEMU convert request from KVM to Read/Write acess to Mdev file descriptor.|thumb|'''Figure 17:''' QEMU convert request from KVM to Read/Write acess to Mdev file descriptor. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]
'''Figure 17:''' QEMU convert request from KVM to R/W access to Mdev file descriptor.

==== EPT Page Violations====
Guest [https://infogalactic.com/info/Memory-mapped_I/O Memory Mapped IO (MMIO)] trips Extended Page Table (EPT) violations which are trapped by the host MMU. KVM services EPT violations and forwards to QEMU VFIO PCI driver. QEMU then converts the request from KVM to R/W access to the [https://infogalactic.com/info/File_descriptor Mdev File Descriptor (FD)]. Reads and writes are then handled by the host GPU device driver via mediated [https://infogalactic.com/info/Callback_(computer_programming) callbacks (CBs)] and [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst VFIO-mdev].

==== Mediated DMA Translations ====
'''--'''

'''--Researching--'''

'''--'''

Because memory is not statically allocated by the vendor driver under the mediated device framework there is no requirement to make use of traditional VFIO pinned pages (via the vfio-pci module) rather MMIO memory can be mapped at runtime incrementally. As a result non-standard mediated device vfio stub modules may be used.

'''Figure 18:''' Memory regions get added by QEMU.
[[File:Memory Regions Added by QEMU.png|alt=Memory Regions Added by QEMU|thumb|'''Figure 18:''' Memory Regions Added by QEMU. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]

'''Figure 19:''' QEMU calls VFIO_DMA_MAP via Memory listener. (not just guest physical memory but also device memory will be added through this memory listener)
[[File:QEMU calls VFIO DMA MAP via Memory listener.png|alt=QEMU calls VFIO DMA MAP via Memory listener|thumb|'''Figure 19:''' QEMU calls VFIO DMA MAP via Memory listener. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]
[[File:Type 1 IOMMU Tracks VA-GFN.png|alt=Type 1 IOMMU Tracks VA-GFN|thumb|'''Figure 20:''' Type 1 IOMMU Tracks <VA, GFN>. See slides from: [https://open-iov.org/index.php/Mediated_Device_Internals#References_(Talks_&_Reading_Material) <nowiki>[1]</nowiki>]]]
'''Figure 20:''' Type 1 IOMMU tracks <VA, GFN (Guest Frame Number)>. We build a table to list the QEMU VAs and track the their mapping relation to Guest Frame Numbers (GFNs).
===Scheduling===

Scheduling is handled by the host mdev driver.

=== Kernel API ===
Kernel documentation used for implementing a VFIO Mediated Device may be found at [https://www.kernel.org/doc/html/latest/driver-api/vfio-mediated-device.html kernel.org].

=== Sample Code ===
Sample code for various mdev implementations may be found below:

'''Mediated Virtual PCI Display Host Device:'''

# [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy.c mdpy.c]
# [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-fb.c mdpy-fb.c]
# [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mbochs.c mbochs.c].

'''Serial PCI Port-based Mediated Device:'''

# [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mtty.c mtty.c].

====== Compilation ======
To compile the kernel modules linked above you should have your distro's equivalent of the build-essential package installed. The included Makefile should provide all that is necessary to successfully compile the kernel modules. You can type the following command to compile the modules from within the directory:

<code>make</code>

When the make operation has been completed successfully the directory will now contain .ko files. These files are the binary kernel modules.

====== Loading Kernel Modules ======
Now that you have compiled the kernel modules you may load them via the insmod command.

For example to load the '''mtty.ko''' module run the following command from within the directory where you built the modules:

<code>insmod mtty.ko</code>

====== Unloading Kernel Modules ======
To unload any of the kernel modules you may make use of the rmmod command.

For example to unload the '''mtty.ko''' module run the following command:

<code>rmmod mtty.ko</code>

====== Additional Documentation ======
An additional guide explaining how to make use of the mtty.c sample code may be found at [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294#n308 line 308] of the kernel.org VFIO Mediated Device documentation.
=== Mdev Mode Requirements ===
Driver Support.

Software HPA<->GPA Boundary Enforcement.

==SR-IOV Mode==
'''SR-IOV Mode (Single Root I/O Virtualization)''' involves hardware assisted virtualization on I/O peripherals.

=== Knowledge Resources Used ===
This section is supported by significant contributions in open source by Zheng Xiao, Jerry Jiang, & Ken Xue.

See reference 6 in the [https://open-iov.org/index.php/Virtual_IO_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

=== Instruction Execution===
SR-IOV communicates instructions from a virtual function (VF) directly to the [https://infogalactic.com/info/PCI_configuration_space PCI BAR].

===Memory Management===
Guests are presenting with passthrough memory regions by the device firmware.

=== Scheduling===
Scheduling may be handled by the host mdev driver and/or the device firmware.
=== SR-IOV Mode Requirements ===
Driver Support.

Device SR-IOV support.

Firmware HPA<->GPA Boundary Enforcement.
== SIOV Mode ==
'''SIOV (Scalable I/O Virtualization)''' involves the combination of concepts form both [https://open-iov.org/index.php/Mediated_Device_Internals#SR-IOV_Mode SR-IOV Mode] and [https://open-iov.org/index.php/Mediated_Device_Internals#Mdev_Mode Mdev Mode] as well as novel concepts like shared IOMMU aware buffers and offloading of PCI config space VM-exits (slow path) to a discrete controller.

'''Revision 1.0 of the SIOV specification''' can be read on the [https://www.opencompute.org/documents/ocp-scalable-io-virtualization-technical-specification-revision-1-v1-2-pdf Open Compute Project website].

=== Knowledge Resources Used ===
This section is supported by significant contributions in open source by Kevin Tian, Tina Zhang, Xin Zeng, Yi Liu.

See references 3, 4, 5, 16, 17, 18, 19, 20, 21 in the [https://open-iov.org/index.php/Virtual_IO_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

===Memory Management===
Enhancements to [https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/alder-lake-desktop/12th-generation-intel-core-processors-datasheet-volume-1-of-2/002/intel-virtualization-technology-for-directed-i-o/ Intel VT-d] introduce new 'Scalable Mode' to allow the platform to assign more granular IOMMU allocations to mediated devices ([https://open-iov.org/index.php/Mediated_Device_Internals#Memory_Management mdev memory management]). This change is referred to as an IOMMU Aware Mediated Device.

==== IOMMU Aware Mediated Device ====
SIOV made several changes to the VFIO driver, Intel IOMMU, and Mediated Device Framework.

The full list of these changes can be seen in the [https://lwn.net/ml/linux-kernel/20190222021927.13132-1-baolu.lu@linux.intel.com/ mailing list archive on '''lwn.net''' under '''(vfio/mdev: IOMMU aware mediated device)'''].

==== Shared Hardware Workqueues ====
SIOV makes use of Shared Hardware Workqueues which may be accessed by processes or Virtual Machines.

[https://www.kernel.org/doc/html/latest/x86/sva.html According to '''kernel.org''']: ''"In order to allow the hardware to distinguish the context for which work is being executed in the hardware by SWQ interface, SIOV uses Process Address Space ID (PASID), which is a 20-bit number defined by the PCIe SIG. PASID value is encoded in all transactions from the device. This allows the IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe Resource Identifier (RID) which is the Bus/Device/Function."''

=== SIOV Mode Requirements ===
Driver Support.

Device SIOV support.

Firmware HPA<->GPA Boundary Enforcement.

== References (Talks & Reading Material) ==

# [https://www.youtube.com/watch?v=Xs0TJU_sIPc <nowiki>[2016] vGPU on KVM - A VFIO Based Framework by Neo Jia & Kirti Wankhede</nowiki>] - [https://www.linux-kvm.org/images/5/59/02x03-Neo_Jia_and_Kirti_Wankhede-vGPU_on_KVM-A_VFIO_based_Framework.pdf slides]
# [https://www.youtube.com/watch?v=WFkdTFTOTpA <nowiki>[2016] An Introduction to PCI Device Assignment with VFIO by Alex Williamson</nowiki>] - [http://events17.linuxfoundation.org/sites/events/files/slides/An%20Introduction%20to%20PCI%20Device%20Assignment%20with%20VFIO%20-%20Williamson%20-%202016-08-30_0.pdf slides]
#[https://events19.linuxfoundation.cn/wp-content/uploads/2017/11/Intel%C2%AE-Scalable-I_O-Virtualization_Kevin-Tian.pdf <nowiki>[2017] Scalable I/O Virtualization by Kevin Tian</nowiki>]
#[https://www.youtube.com/watch?v=G6D-jaCs6sc <nowiki>[2019] Bring a Scalable IOV Capable Device into Linux World by Xin Zeng & Yi Liu</nowiki>] - [https://static.sched.com/hosted_files/kvmforum2019/5e/Bring%20a%20scalable%20IOV%20capable%20device%20into%20Linux%20-%20KVM%202019.pdf slides]
#[https://www.opencompute.org/documents/ocp-scalable-io-virtualization-technical-specification-revision-1-v1-2-pdf Scalable I/O Virtualization Revision 1.0]
#[https://www.youtube.com/watch?v=_tB3EbFDcRQ <nowiki>[2018] Live Migration Support for GPU with SRIOV by Zheng Xiao, Jerry Jiang & Ken Xue</nowiki>] - [https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Live-Migration-Support-for-GPU-with-SRIOV-Challenges-and-Solution-Zheng-Xiao-Alibaba-Cloud-Jerry-Jiang-Ken-Xue-AMD.pdf slides]
#[https://www.youtube.com/watch?v=UODxW1opfn0 <nowiki>[2017] Intel GVT-g: From Production to Upstream - Zhi Wang, Intel</nowiki>]
#[https://www.youtube.com/watch?v=DKYvQ3FdFeo <nowiki>[2016] Qemu Graphics Update 2016 by Gerd Hoffmann</nowiki>]
# [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL]
# [https://man7.org/linux/man-pages/man2/eventfd.2.html eventfd] - [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/virt/kvm/eventfd.c root/virt/kvm/eventfd.c]
# (kernel diff:: [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=721eecbf4fe995ca94a9edec0c9843b1cc0eaaf3 KVM: irqfd])
# (kernel diff:: [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d34e6b175e61821026893ec5298cc8e7558df43a KVM: add ioeventfd support])
# [https://web.archive.org/web/20220120223711/http://blog.allenx.org/2015/07/05/kvm-irqfd-and-ioeventfd KVM irqfd & ioeventfd]
# [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio.rst VFIO - Virtual Function I/O] - [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/virt/kvm/vfio.c root/virt/kvm/vfio.c]
# [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst VFIO Mediated Devices]
#[https://lwn.net/ml/linux-kernel/20190222021927.13132-1-baolu.lu@linux.intel.com/ IOMMU Aware Mediated Device]
# [https://www.youtube.com/watch?v=95KSKrZM8oQ Hardware-Assisted Mediated Pass-Through with VFIO by Kevin Tian]
# [https://www.youtube.com/watch?v=cHMLBcHplhk <nowiki>[2017] Generic Buffer Sharing Mechanism for Mediated Devices by Tina Zhang</nowiki>]
# [https://www.youtube.com/watch?v=KWRKx_uxUDI <nowiki>[2019] Toward a Virtualization World Built on Mediated Pass-Through - Kevin Tian</nowiki>]
# [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/auxiliary_bus.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 Auxiliary Bus]
#[https://www.kernel.org/doc/html/latest/x86/sva.html Shared Virtual Addressing]
#[https://vfio.blogspot.com/ vfio.blogspot.com]
#[https://01.org/sites/default/files/xengt.pdf XenGT by Kevin Tian]
#[https://web.archive.org/web/20130721094438/http://labs.vmware.com/academic/publications/gpu-virtualization VMWare Academic Publications: GPU Virtualization by Micah Dowty & Jeremy Sugerman]
#[https://pcisig.com/specifications/iov PCI SIG I/O Virtualization]
#[https://web.archive.org/web/20120108034526/http://sysweb.cs.toronto.edu/vmgl VMGL by the University of Toronto Computer Science Department]
#[https://web.archive.org/web/20120518125815/http://www.nvidia.com/object/vgx-hypervisor.html Kepler VGX Hypervisor]
#[https://web.archive.org/web/20090218010221/http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-devices Intel Virtualization Technology for Directed I/O (VT-d): Enhancing Intel platforms for efficient virtualization of I/O devices]
#[https://lwn.net/2001/0712/a/dma-interface.php3 DMA-Mapping.txt (Dynamic DMA Mapping)]
#[https://events.static.linuxfound.org/slides/2011/linuxcon-japan/lcj2011_guangrong.pdf KVM MMU Virtualization by Xiao Guangrong]
#[https://wiki.osdev.org/PCI OSDEV: PCI]
#[http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1 Down to the TLP: How PCI express devices talk (Part I)]
#[http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-2 Down to the TLP: How PCI express devices talk (Part II)]
#[https://projectacrn.github.io/latest/tutorials/sriov_virtualization.html Intel ACRN: Enabling SR-IOV Virtualization]
#[https://docs.kernel.org/admin-guide/abi-testing.html#abi-file-testing-sysfs-bus-pci Kernel.org sysfs-bus-pci]
#[https://linux-kernel-labs.github.io/refs/heads/master/labs/memory_mapping.html Linux Kernel Labs: Memory Mapping]
#[https://rayanfam.com/topics/inside-windows-page-frame-number-part1/ Inside Windows Page Frame Number (PFN) - Part 1]
#[https://en.wikipedia.org/wiki/Direct_Rendering_Infrastructure Direct Rendering Infrastructure (DRI)]
#[https://dri.sourceforge.net/doc/DRIuserguide.html DRI User Guide]
#[https://linux-kernel-labs.github.io/refs/heads/master/lectures/virt.html#i-o-virtualization Linux Kernel Labs: IO Virtualization]
#[https://dri.freedesktop.org/doxygen/gallium/ Gallium3D: Main Page]
#[https://web.archive.org/web/20080107043445/http://www.tungstengraphics.com/wiki/index.php/Gallium3D Gallium3D Wiki]
#[https://web.archive.org/web/20090219182518/http://www.tungstengraphics.com/wiki/files/gallium3d-xds2007.pdf Gallium3D talk from XDS 2007]
#[https://projectacrn.github.io/latest/developer-guides/hld/hv-memmgt.html Memory Management High Level Design (ARCN)]
#[https://web.archive.org/web/20080111122628/http://www.digit-life.com/articles2/gffx/nv40-part1-a.html Block Architecture Diagrams for Geforce series]
#[http://freenv.svn.sourceforge.net/viewvc/freenv/doc/shaderinsnformat/bitdiagen/ Block diagrams for 40 series instruction format]
#[[wikipedia:PCI_configuration_space|PCI Configuration Space]]
#[https://learn.microsoft.com/en-us/windows-hardware/drivers/display/gpu-virtual-memory-in-wddm-2-0 GPU virtual memory in WDDM 2.0]
#[https://01.org/linuxgraphics/documentation/hardware-specification-prms Intel Graphics Programmer's Reference Manuals (PRM)]
#[https://bwidawsk.net/blog/2013/1/i915-hardware-contexts-and-some-bits-about-batchbuffers/ i915: Hardware Contexts (and some bits about batchbuffers)]
#[https://bwidawsk.net/blog/2014/6/the-global-gtt-part-1/ i915: The Global GTT Part 1]
#[https://bwidawsk.net/blog/2014/6/aliasing-ppgtt-part-2/ i915: Aliasing PPGTT Part 2]
#[https://bwidawsk.net/blog/2014/7/true-ppgtt-part-3/ i915: True PPGTT Part 3]
#[https://bwidawsk.net/blog/2014/7/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/ i915: Future PPGTT Part 4 (Dynamic page table allocations, 64 bit address space, GPU "mirroring", and yeah, something about relocs too)]
#[https://igor-blue.github.io/2021/02/10/graphics-part1.html i915: Security of the Intel Graphics Stack - Part 1 - Introduction]
#[https://igor-blue.github.io/2021/02/24/graphics-part2.html i915: Security of the Intel Graphics Stack - Part 2 - FW <-> GuC]
#[https://airbus-seclab.github.io/qemu_blog/pci_slave.html A deep dive into QEMU: PCI slave devices]
#[https://qemu-project.gitlab.io/qemu/system/gdb.html Debugging QEMU Guests with GDB (start & stop, examine state like registers & memory, set breakpoints & watchpoints)]
#[https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IOMMU Introduction]
#[https://wiki.archlinux.org/title/intel_graphics Arch Wiki: Intel Graphics]
#[https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf An Introduction to Intel Graphics Virtualization Technology (legacy GVT-g) by Zhi Wang]
#[https://patchwork.kernel.org/project/qemu-devel/patch/1478293856-8191-11-git-send-email-kwankhede@nvidia.com/ Kernel.org: vfio iommu type1: Add support for mediated devices]
#[https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware and Compute Abstraction Layers For Accelerated Computing Using Graphics Hardware and Conventional CPUs]
#[https://www.blackhat.com/docs/us-14/materials/us-14-Torrey-MoRE-Shadow-Walker-The-Progression-Of-TLB-Splitting-On-x86.pdf More Shadow Walker: The Progression of TLB-Splitting on x86]
#[https://revers.engineering/mmu-ept-technical-details/ MMU Virtualization Via Intel EPT: Technical Details]
#[https://github.com/awilliam/linux-vfio/tree/next Alex Williamson Github: VFIO Development (tree next)]
#[https://events.static.linuxfound.org/slides/2011/linuxcon-japan/lcj2011_linming.pdf GPU PMU: performance monitoring with perf event]
#[https://book.systemsapproach.org/e2e/rpc.html Remote Procedure Call (RPC)]
#[http://www.virtualopensystems.com/en/products/api-remoting/ Virtual Open Systems: API Remoting]
#[https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian A Full GPU Virtualization Solution with Mediated Pass-Through by Yiying Zhang, David Cowperthwaite, Kun Tian, and Yaozu Dong] - [https://www.usenix.org/system/files/conference/atc14/atc14-paper-tian.pdf <nowiki>[Paper]</nowiki>] - [https://www.youtube.com/watch?v=vvYmDQKZ6MQ <nowiki>[Video]</nowiki>] - [https://www.usenix.org/sites/default/files/conference/protected-files/atc14_slides_tian.pdf <nowiki>[Slides 1]</nowiki>] - [https://cseweb.ucsd.edu/~yiying/cse291j-winter20/reading/GPU-Virtualization.pdf <nowiki>[Slides 2]</nowiki>]
#[https://www.youtube.com/watch?v=-iuIu7_GuEo <nowiki>[2014] KvmGT: A Full GPU Virtualization Solution by Jike Song</nowiki>]
#[https://docs.kernel.org/gpu/drm-mm.html docs.kernel.org: DRM Memory Management]
#[https://docs.kernel.org/PCI/index.html docs.kernel.org: PCI Bus Subsystem]
#[https://openglbook.com/chapter-0-preface-what-is-opengl.html openglbook.com: What is OpenGL?]
#[https://yewtu.be/watch?v=haes4_Xnc5Q <nowiki>[2020] Getting pixels on screen: introduction to Kernel Mode Setting (KMS) by Simon Ser</nowiki>] - [https://fs.emersion.fr/protected/presentations/present.html?src=kms-foss-north/index.md#1 slides]
#[https://sdic.sjtu.edu.cn/wp-content/uploads/2019/12/Mediated-Pass-Through-MPT-NVIDIA-vGPU.pptx Dynamic Mediation for Live Migration VFIO-PCI]
#[https://lwn.net/Articles/746127/ DRM Management Via CGroups]
#[https://lwn.net/Articles/292583/ DRM: Add GEM ("graphics execution manager") to i915 driver.]

GPU Driver Internals

2023-06-19T17:53:55Z

Arthur: /* References (Talks & Reading Material) */

This page will detail the internals of various GPU drivers for use with I/O Virtualization.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

== Article Structure ==
This article will aim to provide information about the following details of GPU drivers:

=== High Level Architecture ===
This section will detail high level architectures of each driver.

=== Initialization ===
This section will cover how the GPU's embedded components, the GPU driver, and virtualization functions are initialized.

=== Scheduling ===
This section will detail how the device schedules instructions for execution.

This will attempt to provide a comprehensive view of scheduling from virtualized processes down to execution within the device.

==== In-VM Scheduling ====
The In-VM Scheduling section will detail how instructions scheduled within the virtual machine's GPU device driver.

==== Between-VM Scheduling ====
The Between-VM Scheduling section will detail how the host kernel module and/or virtual GPU helper functions handle scheduling / context swaps between virtual machines on a computer system.

==== Firmware Scheduling (if applicable) ====
The Firmware Scheduling section will cover the GPU's internal scheduling model if a deferred execution pathway is used (like i915's GuC or OpenRM's GSP).

In the case an intermediate scheduling microcontroller is not used this section may be less applicable (ie: vExeclist scheduling under Intel vGPUs).

=== Memory Management ===
The Memory Management section will cover how the GPU driver manages memory for virtual GPUs and host processes.

This will aim detail paging abstractions used for global memory translation, and per-process memory translation.

=== Display Surface Virtualization ===
The Display Surface Virtualization section will detail how virtual displays are provided to guests and/or graphics rendering buffers (pixel surfaces) are shared from guest to host if such functions are provided via the driver.

==== Graphics Buffer Sharing ====
This section will cover methods provided by the GPU driver suitable for high performance graphics sharing.

== Vendor Neutral ==
This section will detail functions in common between GPU drivers.

Also see [https://github.com/intel/Display-Virtualization-for-Windows-OS Display Virtualization for Windows OS].

=== Knowledge Resources Used ===
This section is supported by significant contributions to documentation and open source by '''Satyeshwar Singh'''.

=== Display Surface Virtualization ===
The Display Surface Virtualization section will detail how virtual displays are provided to guests and/or graphics rendering buffers (pixel surfaces) are shared from guest to host if such functions are provided via the driver.

==== Graphics Buffer Sharing ====
This section will deal with graphics buffer sharing for vendor neutral GPUs.

===== Display Flow in Linux =====
Userspace applications provide their buffers to compositor.

Compositor asks Mesa to create a framebuffer.

Framebuffer is allocated through vendor driver.

Vendor driver flips the framebuffer on the screen.

===== SR-IOV and Mdev =====
SR-IOV VFs don't have access to the display controller (only PF does).

Known issue is displaying a VF's framebuffer on the screen.

===== VFs in QEMU Hypervisor (VirtIO-GPU) =====
SR-IOV and Mdev VFs run in QEMU Hypervisor.

QEMU has full access to all pages in VF's address space.

VirtIO-GPU allows for allocating buffers (not API remoting in this case).

Mesa's KMSRO allocates framebuffer via virtio-gpu.

Mesa's KMSRO asks vendor driver to import framebuffer (no specific compositor dependancy).

===== Transport =====
A Scatter Gather List (SGL) of physical pages is constructed by VirtIO-GPU of VF.

====== QEMU Host-Side u-dma-buf ======
Host QEMU uses u-dma-buf driver to reconstruct a virtual memory pointer from the SGL shared by VF.

u-dma-buf driver allocates contiguous memory blocks in kernel as DMA buffers.

====== QEMU Host-Side Modules ======
Once virtio-qemu has a DMA buffer, it shares it with the QEMU UI.

QEMU UI has several toolkits to support like GTK, SDL, etc. with GTK. being the default.

GTK uses EGL and passes it the DMABUF as a texture.

===== Windows VFs =====
Windows OS doesn't support DMABUF capability so we can't allocate buffers from VirtIO-GPU and share them with the Windows Kernel Mode Driver (KMD).

Windows OS is different in the way that the FB is allocated via the OS rather than the miniport driver.

Modified version of RedHat virtio-gpu-do (short for Display Only) driver used.

===== Windows Driver Stack =====
OS (DWM) allocates frame buffers and asks GPU to write to them through Miniport.

DVServer UMD (IDD) asks the OS for the FB copy.

DVServer UMD (IDD) passes the virtual memory address to DVServer KMD.

===== DVServer KMD =====
DVServer KMD finds the SGL (Scatter Gather List) of physical pages for this virtual memory address.

This SGL is passed via virt-queue from the VF to the host QEMU.

===== Android VFs =====
Android uses Linux kernel underneath. VirtIO-GPU is present in Android's kernel.

KMSRO patch to allocate framebuffer via VirtIO-GPU also ported over to Android.

Need modifications in the minigbm lib of Android to connect with KMSRO part.

Rest of stack looks identical to Linux VF/PF.

== i915 ==

=== Knowledge Resources Used ===
This section is supported by significant contributions to documentation and open source by '''Zhi Wang''', '''Ben Widawsky''', and '''Igor Bogdanov'''.

See references 2, 3, 4, 5, 6, 7, 8, and 9 in the [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

=== High Level Architecture ===

==== i915 Clients ====
Processes which make use of the Intel i915 driver receive an i915 Client ID.

=== Initialization ===
During initialization of the i915 driver the GuC binary blob is offloaded into the Graphics Translation Table (GTT). This allows the GuC to read GTT-loaded binary blob from shared framebuffer memory so that it may boot.

=== Scheduling ===
[[File:Figure 0- i915 vGPU Scheduling.png|alt=Figure 0: i915 vGPU Scheduling|thumb|'''Figure 0:''' i915 vGPU high-level Scheduling. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]
There are several abstractions for GPU virtual machine scheduling. Those are as follows:
* Guest kernel (i915.ko)
* Host kernel (i915.ko)
* Device Firmware (GuC)

===== In-VM Scheduling =====
===== Guest kernel (i915.ko) =====

====== vExeclist ======
The vExeclist is a method to submit commands directly to the GPU without the use of an intermediate microcontroller.

====== vGuC ======
vGuC is a command submission interface used to process commands to the Intel [https://open-iov.org/index.php/GPU_Firmware#GuC Graphics Microcontroller (GuC)].

==== Between-VM Scheduling ====
[[File:I915 Scheduling Events and Requests.png|alt=Figure 1: i915 vGPU Scheduling Requests & Events|thumb|'''Figure 1:''' i915 vGPU Scheduling Requests & Events. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]

===== Host kernel (i915.ko) =====
Submitting commands to the GPU under i915 can take several paths. One pathway makes use of direct command submission without an intermediate micro-controller whereas the other uses an intermediate micro-controller. The intermediate micro-controller approach increases the amount of binary-blob code used and abstracts the kernel module from the device's internal scheduling model.

====== Execlist ======
Execlist executes commands synchronously on the device without an intermediate microcontroller. This is the preferred method of executing commands by some driver developers because of it's stability and transparency under current i915 development.

====== GuC ======
GuC provides an execution pathway with an intermediate microcontroller providing a scheduling abstraction for Intel's preferred internal scheduling model.
[[File:Figure 2- Intel vGPU Scheduler.png|alt=Figure 2: Intel vGPU Scheduler request flow.|thumb|'''Figure 2:''' i915 vGPU Scheduler request flow. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]

===== Device Firmware (GuC) =====

=== Memory Management ===

===== Translation Tables =====

====== GTT (Graphics Translation Table) ======
GPU Memory on-device is a part of a GTT or Graphics Translation Table. This table stores information globally for all graphics processes within the system. Some processes access the Global Graphics Translation Table (GGTT) such as [[wikipedia:Direct_Rendering_Infrastructure|DRI]] while other's receive a Per Process Graphics Translation Table (PPGTT) buffer based on their i915 Client ID.

====== GGTT (Global Graphics Translation Table) ======

====== PPGTT (Per Process Graphics Translation Table) ======
Process-specific memory buffers are stored inside a Per Process Graphics Translation Table or PPGTT. This is a [https://open-iov.org/index.php/Virtual_IO_Internals#VRAM_Isolation_(GPU_GMMU) GPU MMU] translated subregion or IOVA of global GPU memory specific to a GPU process's client ID.

====== Aliasing PPGTT ======
Aliasing PPGTT (Per Process Graphics Translation Table) refers to partially separated GPU resources (process context).

====== Real PPGTT ======
Real PPGTT (Per Process Graphics Translation Table) refers to fully separated GPU resources (process context).

=== Display Surface Virtualization ===
The Intel vGPU makes use of two modes for Display Surface Virtualization.

* VirtIO-GPU (Linux)
* Indirect Display Driver (Windows)

==== VirtIO-GPU ====

==== IDD (Indirect Display Driver) ====
The IDD ([https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview Indirect Display Driver]) provides virtual display functions with arbitrary resolutions to any rendering device virtual or physical.

Intel's IDD can be found in their [https://github.com/intel/Display-Virtualization-for-Windows-OS/ Display Virtualization for Windows OS] repository.

==== Graphics Buffer Sharing ====
Intel's i915 driver provides functionality to directly map display memory from a [https://open-iov.org/index.php/Merged_Drivers guest vGPU Virtual Function (either SR-IOV or VFIO-Mdev) into the host GPU's Physical Function] without slow memory copies or graphics compression. For users running GPU virtualization on their local device this results in a significant performance uplift compared to traditional graphics sharing functionality built for remote access use-cases such as VDI.

Intel's [https://github.com/intel/Display-Virtualization-for-Windows-OS 0copy display virtualization tools] are simple to implement as sharing does not rely upon an added [https://www.qemu.org/docs/master/system/devices/ivshmem.html IVSHMEM (Inter-VM Shared Memory device)] - rather the host directly maps the guest's display memory via [https://lwn.net/Articles/758903/ udmabuf] as the buffer sharing functions are provided within the i915 open source driver.

== OpenRM ==
[[File:Figure 3- GPU BAR to Timeshared Syhededuling via Channel IO.png|alt=Figure 3: GPU BAR to Timeshared Syhededuling via Channel IO|thumb|'''Figure 3:''' GPU BAR to Timeshared Scheduling via Channel IO.]]

=== Knowledge Resources Used ===
This section is supported by significant contributions to documentation and open source by '''Andy Currid''', '''Neo Jia''', '''John Fanelli''', and '''Neha Joshi'''.

=== High Level Architecture ===
The Open Resource Manager (RM driver) makes use of a highly object oriented paradigm comprised of multiple "engines" which act as micro-services for servicing driver requests.

The Open Resource Manager driver (also known as [https://open-iov.org/index.php/OpenRM OpenRM]) refers to Nvidia's [https://github.com/NVIDIA/open-gpu-kernel-modules open-kernel-modules].

Broadly speaking the OpenRM driver consists of two parts.

* The Platform RM (OpenRM)
* The Firmware RM (GSP RM / RM Core)
The Platform RM is loaded into the Linux Kernel as nvidia.ko. This module communicates with the GSP RM for via [[wikipedia:Remote_procedure_call|Remote Procedure Calls (RPCs)]] to communicate with engines from the RM Core.

===== RM Clients =====
Processes (local, remote, or virtualized) which make use of the RM driver receive an RM Client ID.

==== RM Server ====
The RM Server (or Resource Server) tracks RM Clients as well as the hardware and software resources they control, allocate, and free.

==== RM API ====
API to control the Resource Manager Server.

==== RM Core ====
Core functions of the RM driver controlling resource locking, mapping, unmapping, control calls, constructors, and deconstructors. Under OpenRM the RM Core runs within the GPU System Processor (GSP micro-controller) while in pre-open source versions of the Resource Manager the RM Core ran within the nvidia.ko kernel module

=== Initialization ===
During bring up of the hardware several binary blobs are loaded from embedded [[wikipedia:Boot_ROM|Boot ROM]] memory to bootstrap embedded controller bring up from which point additional software is loaded from onboard [[wikipedia:Serial_Peripheral_Interface|SPI]] flash memory.

Software loaded from SPI flash is necessary for the full initialization of the Falcon/NvRISC processor as well as a cached version of the software necessary to run the GPU System Processor (GSP).

Once the platform is posted it is ready to communicate with the host platform's RM driver. The OpenRM driver offloads a binary blob containing the RM Core to the [https://open-iov.org/index.php/GPU_Firmware#GSP GPU System Processor (GSP)] which is likely to contain a more recent version than the cached version contained in on-board SPI flash.

=== Scheduling ===
There are several abstractions for GPU virtual machine scheduling. Those are as follows:

* Guest kernel (nvidia.ko)
* Host kernel (nvidia.ko)
*Host usermode (gpu-mgr / libnvidiavgpu.so)
* Device Firmware (GSP)

==== Command Submission ====

===== Runlist =====

==== In-VM Scheduling ====
Virtual machines contain their own GPU scheduling within the Nvidia kernel module in the guest OS.

===== Guest kernel (nvidia.ko) =====
Within a virtual machine running the Nvidia driver messages to the GPU are first sent to the guest nvidia.ko kernel module.

The guest then determines whether a vRPC (virtual Remote Procedure Call) or a pRPC (physical Remote Procedure Call) should be sent. Both pRPCs and vRPCs are sent through the host RM driver (Resource Manager).<blockquote>''Note: Unclear on execution pathway for pRPCs vs vRPCs. pRPCs may go directly to device.''</blockquote>

==== Between-VM Scheduling ====
In addition to scheduling which occurs within the virtual machine the Resource Manager driver also schedules messages to the GPU between GPU-accelerated virtual machines and host processes.

===== Host kernel (nvidia.ko) =====
Messages sent by the guest (via vRPC or pRPC) are received by the host Nvidia.ko driver.

Nvidia.ko contains a virtual GPU state machine which contains status information for the virtual GPU.

Nvidia.ko also contains a virtual GPU kernel scheduler which interacts with virtual GPU objects.

Nvidia.ko also contains an RM Call scheduler which schedules calls on an RM class.

The Nividia.ko kernel module exits to userspace to execute the nvidia-vgpu-mgr and VMIOP (Virtual Machine Input Output Plugin).

===== Host usermode (nvidia-vgpu-mgr / libnvidia-vgpu.so) =====
After exiting to userspace a daemon process (the nvidia-vgpu-mgr, and it's library libnvidia-vgpu.so) are executed to schedule VM-exits described below.

===== nvidia-vgpu-mgr =====
The nvidia-vgpu-mgr is a process which provides the spawning of virtual GPU stubs and population with capability information.

This daemon process interacts with the libnvidia-vgpu.so, nvidia.ko, and nvidia-vgpu-vfio.ko components.

This process and the libnvidia-vgpu.so contain, and execute the VMIOP.

This process (and the VMIOP) schedules RPCs sent by the guest, receives VFIO BAR-exits, and relays requests to allocate, deallocate, pin, unpin, map, unmap to the [https://open-iov.org/index.php/GPU_Driver_Internals#RM_Core RM Core].

====== vmiop ======
VMIOP (Virtual Machine Input Output Plugin) handles presenting virtualized functionality into the guest.

The Virtual Machine Input Output Plugin software handles virtual displays, compute API offload, and most importantly [https://open-iov.org/index.php/Virtual_I/O_Internals#VFIO_Quirks_(region_traps) BAR (Base Address Register) quirks].

The VMIOP is an [[wikipedia:Software_development_kit|SDK (Software Development Kit)]] provided in binary format split between the libnvidiavgpu.so and nvidia-vgpu-mgr which provides userland helper functions for GPU virtualization.

Messages to the VMIOP are scheduled by Linux kernel [https://www.learnlinux.org.za/courses/build/internals/ch07s02.html niceness] (scheduling abstraction).

===== Device Firmware (GSP) =====
Messages received by the host RM driver (Resource Manager) are then scheduled by the RM Core contained within the GSP (GPU System Processor). The GSP handles the device's internal scheduling model.

=== Memory Management ===
Managing virtual machine memory is very important to the security of virtualization.

This section will cover the method by which secure memory "enclaves" or [https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IO Virtual Addresses (IOVAs)] may be provisioned and separation enforced by hardware constructs within the GPU.

===== Programming the MMU =====
In order to hardware enforce separation between memory allocated to Virtual Machines (VMs) virtualization software must program the GPU's MMU (GMMU controller) to create IO Virtual Addresses (IOVAs).

In order to create such configurations several abstractions are used to translate high level representations of virtualization programmed via the vmiop and gpu-mgr into practical, architecture specific instructions.

====== AMAPLibrary ======
The AMAPLibrary acts as a device abstraction framework for GPU driver software to program using high level representations of MMU configuration.

The AMAPLibrary translates high level representations of GPU virtualization into graphics architecture-specific logic contained within architecture HALs (Hardware Abstraction Layers).

====== Architecture HALs ======
GPU [https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware Abstraction Layers (HALs)] contain logic specific to graphics architectures, for instance the precise method by which the GPU driver may interact with the Falcon to provision MMU protected memory.

====== DMA from Falcon / NvRISC-V to Frame Buffer Interface (FBIF) ======
The Falcon (FAst Logic CONtroller) / NvRISC-V embedded controller emits DMAs to the Frame Buffer Interface (FBIF) in order to interact with the device's GMMU controller.

User programs and VMs have their memory translated through the GMMU.

Once created memory translations within a virtual machine's IO Virtual Address (IOVA) will be protected by the device's GMMU controller.

===== Translation Tables =====

====== vmiop_gva ======

=== Display Surface Virtualization ===

== amdgpu ==

== References (Talks & Reading Material) ==
#[https://01.org/linuxgraphics/documentation/hardware-specification-prms Intel Graphics Programmer's Reference Manuals (PRM)]
#[https://bwidawsk.net/blog/2013/1/i915-hardware-contexts-and-some-bits-about-batchbuffers/ i915: Hardware Contexts (and some bits about batchbuffers)]
#[https://bwidawsk.net/blog/2014/6/the-global-gtt-part-1/ i915: The Global GTT Part 1]
#[https://bwidawsk.net/blog/2014/6/aliasing-ppgtt-part-2/ i915: Aliasing PPGTT Part 2]
#[https://bwidawsk.net/blog/2014/7/true-ppgtt-part-3/ i915: True PPGTT Part 3]
#[https://bwidawsk.net/blog/2014/7/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/ i915: Future PPGTT Part 4 (Dynamic page table allocations, 64 bit address space, GPU "mirroring", and yeah, something about relocs too)]
#[https://igor-blue.github.io/2021/02/10/graphics-part1.html i915: Security of the Intel Graphics Stack - Part 1 - Introduction]
#[https://igor-blue.github.io/2021/02/24/graphics-part2.html i915: Security of the Intel Graphics Stack - Part 2 - FW <-> GuC]
#[https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf i915: An Introduction to Intel GVT-g (with new architecture)]
#[https://lwn.net/Articles/758903/ lwn.net: Add udmabuf misc device]
#[https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf Nvidia RISC-v Story]
#[https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IOMMU Introduction]
#[https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware and Compute Abstraction Layers For Accelerated Computing Using Graphics Hardware and Conventional CPUs]
#[https://envytools.readthedocs.io/en/latest/hw/intro.html nVidia GPU Introduction (envytools)]
#[https://on-demand.gputechconf.com/gtc/2014/presentations/S4725-hi-perf-graphics-nvidia-grid-virtual-gpus.pdf Delivering High Performance Remote Graphics With Nvidia GRID Virtual GPU]
#[https://nehajoshi.dev/post/nvidia_mig_feature/ NVIDIA Multi-Instance GPU]
#[https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol05-memory_views.pdf <nowiki>i915: Kaby Lake Intel Graphics Programmer's Reference Manual [Volume 5: Memory Views]</nowiki>]
#[https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipeline Lifecycle of a Triangle - Nvidia's logical pipeline]
#[https://docs.nvidia.com/cuda/parallel-thread-execution/ Nvidia Parallel Thread Execution (PTX)]
#[https://blog.ffwll.ch/2013/01/i915gem-crashcourse-overview.html i915/GEM Crashcourse]
#[https://lwn.net/Articles/292583/ drm: Add GEM ("graphics execution manager") to i915 driver.]

Virtual I/O Internals

2023-06-19T17:52:39Z

Arthur:

GPU Driver Internals

2023-06-19T17:51:34Z

Arthur: /* References (Talks & Reading Material) */

GPU Driver Internals

2023-06-16T20:54:27Z

Arthur: /* QEMU Host-Side Modules */

GPU Driver Internals

2023-06-16T20:54:13Z

Arthur:

This page will detail the internals of various GPU drivers for use with I/O Virtualization.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

== Article Structure ==
This article will aim to provide information about the following details of GPU drivers:

=== High Level Architecture ===
This section will detail high level architectures of each driver.

=== Initialization ===
This section will cover how the GPU's embedded components, the GPU driver, and virtualization functions are initialized.

=== Scheduling ===
This section will detail how the device schedules instructions for execution.

This will attempt to provide a comprehensive view of scheduling from virtualized processes down to execution within the device.

==== In-VM Scheduling ====
The In-VM Scheduling section will detail how instructions scheduled within the virtual machine's GPU device driver.

==== Between-VM Scheduling ====
The Between-VM Scheduling section will detail how the host kernel module and/or virtual GPU helper functions handle scheduling / context swaps between virtual machines on a computer system.

==== Firmware Scheduling (if applicable) ====
The Firmware Scheduling section will cover the GPU's internal scheduling model if a deferred execution pathway is used (like i915's GuC or OpenRM's GSP).

In the case an intermediate scheduling microcontroller is not used this section may be less applicable (ie: vExeclist scheduling under Intel vGPUs).

=== Memory Management ===
The Memory Management section will cover how the GPU driver manages memory for virtual GPUs and host processes.

This will aim detail paging abstractions used for global memory translation, and per-process memory translation.

=== Display Surface Virtualization ===
The Display Surface Virtualization section will detail how virtual displays are provided to guests and/or graphics rendering buffers (pixel surfaces) are shared from guest to host if such functions are provided via the driver.

==== Graphics Buffer Sharing ====
This section will cover methods provided by the GPU driver suitable for high performance graphics sharing.

== Vendor Neutral ==
This section will detail functions in common between GPU drivers.

Also see [https://github.com/intel/Display-Virtualization-for-Windows-OS Display Virtualization for Windows OS].

=== Knowledge Resources Used ===
This section is supported by significant contributions to documentation and open source by '''Satyeshwar Singh'''.

=== Display Surface Virtualization ===
The Display Surface Virtualization section will detail how virtual displays are provided to guests and/or graphics rendering buffers (pixel surfaces) are shared from guest to host if such functions are provided via the driver.

==== Graphics Buffer Sharing ====
This section will deal with graphics buffer sharing for vendor neutral GPUs.

===== Display Flow in Linux =====
Userspace applications provide their buffers to compositor.

Compositor asks Mesa to create a framebuffer.

Framebuffer is allocated through vendor driver.

Vendor driver flips the framebuffer on the screen.

===== SR-IOV and Mdev =====
SR-IOV VFs don't have access to the display controller (only PF does).

Known issue is displaying a VF's framebuffer on the screen.

===== VFs in QEMU Hypervisor (VirtIO-GPU) =====
SR-IOV and Mdev VFs run in QEMU Hypervisor.

QEMU has full access to all pages in VF's address space.

VirtIO-GPU allows for allocating buffers (not API remoting in this case).

Mesa's KMSRO allocates framebuffer via virtio-gpu.

Mesa's KMSRO asks vendor driver to import framebuffer (no specific compositor dependancy).

===== Transport =====
A Scatter Gather List (SGL) of physical pages is constructed by VirtIO-GPU of VF.

====== QEMU Host-Side u-dma-buf ======
Host QEMU uses u-dma-buf driver to reconstruct a virtual memory pointer from the SGL shared by VF.

u-dma-buf driver allocates contiguous memory blocks in kernel as DMA buffers.

====== '''QEMU Host-Side Modules''' ======
Once virtio-qemu has a DMA buffer, it shares it with the QEMU UI.

QEMU UI has several toolkits to support like GTK, SDL, etc. with GTK. being the default.

GTK uses EGL and passes it the DMABUF as a texture.

===== Windows VFs =====
Windows OS doesn't support DMABUF capability so we can't allocate buffers from VirtIO-GPU and share them with the Windows Kernel Mode Driver (KMD).

Windows OS is different in the way that the FB is allocated via the OS rather than the miniport driver.

Modified version of RedHat virtio-gpu-do (short for Display Only) driver used.

===== Windows Driver Stack =====
OS (DWM) allocates frame buffers and asks GPU to write to them through Miniport.

DVServer UMD (IDD) asks the OS for the FB copy.

DVServer UMD (IDD) passes the virtual memory address to DVServer KMD.

===== DVServer KMD =====
DVServer KMD finds the SGL (Scatter Gather List) of physical pages for this virtual memory address.

This SGL is passed via virt-queue from the VF to the host QEMU.

===== Android VFs =====
Android uses Linux kernel underneath. VirtIO-GPU is present in Android's kernel.

KMSRO patch to allocate framebuffer via VirtIO-GPU also ported over to Android.

Need modifications in the minigbm lib of Android to connect with KMSRO part.

Rest of stack looks identical to Linux VF/PF.

== i915 ==

=== Knowledge Resources Used ===
This section is supported by significant contributions to documentation and open source by '''Zhi Wang''', '''Ben Widawsky''', and '''Igor Bogdanov'''.

See references 2, 3, 4, 5, 6, 7, 8, and 9 in the [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

=== High Level Architecture ===

==== i915 Clients ====
Processes which make use of the Intel i915 driver receive an i915 Client ID.

=== Initialization ===
During initialization of the i915 driver the GuC binary blob is offloaded into the Graphics Translation Table (GTT). This allows the GuC to read GTT-loaded binary blob from shared framebuffer memory so that it may boot.

=== Scheduling ===
[[File:Figure 0- i915 vGPU Scheduling.png|alt=Figure 0: i915 vGPU Scheduling|thumb|'''Figure 0:''' i915 vGPU high-level Scheduling. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]
There are several abstractions for GPU virtual machine scheduling. Those are as follows:
* Guest kernel (i915.ko)
* Host kernel (i915.ko)
* Device Firmware (GuC)

===== In-VM Scheduling =====
===== Guest kernel (i915.ko) =====

====== vExeclist ======
The vExeclist is a method to submit commands directly to the GPU without the use of an intermediate microcontroller.

====== vGuC ======
vGuC is a command submission interface used to process commands to the Intel [https://open-iov.org/index.php/GPU_Firmware#GuC Graphics Microcontroller (GuC)].

==== Between-VM Scheduling ====
[[File:I915 Scheduling Events and Requests.png|alt=Figure 1: i915 vGPU Scheduling Requests & Events|thumb|'''Figure 1:''' i915 vGPU Scheduling Requests & Events. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]

===== Host kernel (i915.ko) =====
Submitting commands to the GPU under i915 can take several paths. One pathway makes use of direct command submission without an intermediate micro-controller whereas the other uses an intermediate micro-controller. The intermediate micro-controller approach increases the amount of binary-blob code used and abstracts the kernel module from the device's internal scheduling model.

====== Execlist ======
Execlist executes commands synchronously on the device without an intermediate microcontroller. This is the preferred method of executing commands by some driver developers because of it's stability and transparency under current i915 development.

====== GuC ======
GuC provides an execution pathway with an intermediate microcontroller providing a scheduling abstraction for Intel's preferred internal scheduling model.
[[File:Figure 2- Intel vGPU Scheduler.png|alt=Figure 2: Intel vGPU Scheduler request flow.|thumb|'''Figure 2:''' i915 vGPU Scheduler request flow. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]

===== Device Firmware (GuC) =====

=== Memory Management ===

===== Translation Tables =====

====== GTT (Graphics Translation Table) ======
GPU Memory on-device is a part of a GTT or Graphics Translation Table. This table stores information globally for all graphics processes within the system. Some processes access the Global Graphics Translation Table (GGTT) such as [[wikipedia:Direct_Rendering_Infrastructure|DRI]] while other's receive a Per Process Graphics Translation Table (PPGTT) buffer based on their i915 Client ID.

====== GGTT (Global Graphics Translation Table) ======

====== PPGTT (Per Process Graphics Translation Table) ======
Process-specific memory buffers are stored inside a Per Process Graphics Translation Table or PPGTT. This is a [https://open-iov.org/index.php/Virtual_IO_Internals#VRAM_Isolation_(GPU_GMMU) GPU MMU] translated subregion or IOVA of global GPU memory specific to a GPU process's client ID.

====== Aliasing PPGTT ======
Aliasing PPGTT (Per Process Graphics Translation Table) refers to partially separated GPU resources (process context).

====== Real PPGTT ======
Real PPGTT (Per Process Graphics Translation Table) refers to fully separated GPU resources (process context).

=== Display Surface Virtualization ===
The Intel vGPU makes use of two modes for Display Surface Virtualization.

* VirtIO-GPU (Linux)
* Indirect Display Driver (Windows)

==== VirtIO-GPU ====

==== IDD (Indirect Display Driver) ====
The IDD ([https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview Indirect Display Driver]) provides virtual display functions with arbitrary resolutions to any rendering device virtual or physical.

Intel's IDD can be found in their [https://github.com/intel/Display-Virtualization-for-Windows-OS/ Display Virtualization for Windows OS] repository.

==== Graphics Buffer Sharing ====
Intel's i915 driver provides functionality to directly map display memory from a [https://open-iov.org/index.php/Merged_Drivers guest vGPU Virtual Function (either SR-IOV or VFIO-Mdev) into the host GPU's Physical Function] without slow memory copies or graphics compression. For users running GPU virtualization on their local device this results in a significant performance uplift compared to traditional graphics sharing functionality built for remote access use-cases such as VDI.

Intel's [https://github.com/intel/Display-Virtualization-for-Windows-OS 0copy display virtualization tools] are simple to implement as sharing does not rely upon an added [https://www.qemu.org/docs/master/system/devices/ivshmem.html IVSHMEM (Inter-VM Shared Memory device)] - rather the host directly maps the guest's display memory via [https://lwn.net/Articles/758903/ udmabuf] as the buffer sharing functions are provided within the i915 open source driver.

== OpenRM ==
[[File:Figure 3- GPU BAR to Timeshared Syhededuling via Channel IO.png|alt=Figure 3: GPU BAR to Timeshared Syhededuling via Channel IO|thumb|'''Figure 3:''' GPU BAR to Timeshared Scheduling via Channel IO.]]

=== Knowledge Resources Used ===
This section is supported by significant contributions to documentation and open source by '''Andy Currid''', '''Neo Jia''', '''John Fanelli''', and '''Neha Joshi'''.

=== High Level Architecture ===
The Open Resource Manager (RM driver) makes use of a highly object oriented paradigm comprised of multiple "engines" which act as micro-services for servicing driver requests.

The Open Resource Manager driver (also known as [https://open-iov.org/index.php/OpenRM OpenRM]) refers to Nvidia's [https://github.com/NVIDIA/open-gpu-kernel-modules open-kernel-modules].

Broadly speaking the OpenRM driver consists of two parts.

* The Platform RM (OpenRM)
* The Firmware RM (GSP RM / RM Core)
The Platform RM is loaded into the Linux Kernel as nvidia.ko. This module communicates with the GSP RM for via [[wikipedia:Remote_procedure_call|Remote Procedure Calls (RPCs)]] to communicate with engines from the RM Core.

===== RM Clients =====
Processes (local, remote, or virtualized) which make use of the RM driver receive an RM Client ID.

==== RM Server ====
The RM Server (or Resource Server) tracks RM Clients as well as the hardware and software resources they control, allocate, and free.

==== RM API ====
API to control the Resource Manager Server.

==== RM Core ====
Core functions of the RM driver controlling resource locking, mapping, unmapping, control calls, constructors, and deconstructors. Under OpenRM the RM Core runs within the GPU System Processor (GSP micro-controller) while in pre-open source versions of the Resource Manager the RM Core ran within the nvidia.ko kernel module

=== Initialization ===
During bring up of the hardware several binary blobs are loaded from embedded [[wikipedia:Boot_ROM|Boot ROM]] memory to bootstrap embedded controller bring up from which point additional software is loaded from onboard [[wikipedia:Serial_Peripheral_Interface|SPI]] flash memory.

Software loaded from SPI flash is necessary for the full initialization of the Falcon/NvRISC processor as well as a cached version of the software necessary to run the GPU System Processor (GSP).

Once the platform is posted it is ready to communicate with the host platform's RM driver. The OpenRM driver offloads a binary blob containing the RM Core to the [https://open-iov.org/index.php/GPU_Firmware#GSP GPU System Processor (GSP)] which is likely to contain a more recent version than the cached version contained in on-board SPI flash.

=== Scheduling ===
There are several abstractions for GPU virtual machine scheduling. Those are as follows:

* Guest kernel (nvidia.ko)
* Host kernel (nvidia.ko)
*Host usermode (gpu-mgr / libnvidiavgpu.so)
* Device Firmware (GSP)

==== Command Submission ====

===== Runlist =====

==== In-VM Scheduling ====
Virtual machines contain their own GPU scheduling within the Nvidia kernel module in the guest OS.

===== Guest kernel (nvidia.ko) =====
Within a virtual machine running the Nvidia driver messages to the GPU are first sent to the guest nvidia.ko kernel module.

The guest then determines whether a vRPC (virtual Remote Procedure Call) or a pRPC (physical Remote Procedure Call) should be sent. Both pRPCs and vRPCs are sent through the host RM driver (Resource Manager).<blockquote>''Note: Unclear on execution pathway for pRPCs vs vRPCs. pRPCs may go directly to device.''</blockquote>

==== Between-VM Scheduling ====
In addition to scheduling which occurs within the virtual machine the Resource Manager driver also schedules messages to the GPU between GPU-accelerated virtual machines and host processes.

===== Host kernel (nvidia.ko) =====
Messages sent by the guest (via vRPC or pRPC) are received by the host Nvidia.ko driver.

Nvidia.ko contains a virtual GPU state machine which contains status information for the virtual GPU.

Nvidia.ko also contains a virtual GPU kernel scheduler which interacts with virtual GPU objects.

Nvidia.ko also contains an RM Call scheduler which schedules calls on an RM class.

The Nividia.ko kernel module exits to userspace to execute the nvidia-vgpu-mgr and VMIOP (Virtual Machine Input Output Plugin).

===== Host usermode (nvidia-vgpu-mgr / libnvidia-vgpu.so) =====
After exiting to userspace a daemon process (the nvidia-vgpu-mgr, and it's library libnvidia-vgpu.so) are executed to schedule VM-exits described below.

===== nvidia-vgpu-mgr =====
The nvidia-vgpu-mgr is a process which provides the spawning of virtual GPU stubs and population with capability information.

This daemon process interacts with the libnvidia-vgpu.so, nvidia.ko, and nvidia-vgpu-vfio.ko components.

This process and the libnvidia-vgpu.so contain, and execute the VMIOP.

This process (and the VMIOP) schedules RPCs sent by the guest, receives VFIO BAR-exits, and relays requests to allocate, deallocate, pin, unpin, map, unmap to the [https://open-iov.org/index.php/GPU_Driver_Internals#RM_Core RM Core].

====== vmiop ======
VMIOP (Virtual Machine Input Output Plugin) handles presenting virtualized functionality into the guest.

The Virtual Machine Input Output Plugin software handles virtual displays, compute API offload, and most importantly [https://open-iov.org/index.php/Virtual_I/O_Internals#VFIO_Quirks_(region_traps) BAR (Base Address Register) quirks].

The VMIOP is an [[wikipedia:Software_development_kit|SDK (Software Development Kit)]] provided in binary format split between the libnvidiavgpu.so and nvidia-vgpu-mgr which provides userland helper functions for GPU virtualization.

Messages to the VMIOP are scheduled by Linux kernel [https://www.learnlinux.org.za/courses/build/internals/ch07s02.html niceness] (scheduling abstraction).

===== Device Firmware (GSP) =====
Messages received by the host RM driver (Resource Manager) are then scheduled by the RM Core contained within the GSP (GPU System Processor). The GSP handles the device's internal scheduling model.

=== Memory Management ===
Managing virtual machine memory is very important to the security of virtualization.

This section will cover the method by which secure memory "enclaves" or [https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IO Virtual Addresses (IOVAs)] may be provisioned and separation enforced by hardware constructs within the GPU.

===== Programming the MMU =====
In order to hardware enforce separation between memory allocated to Virtual Machines (VMs) virtualization software must program the GPU's MMU (GMMU controller) to create IO Virtual Addresses (IOVAs).

In order to create such configurations several abstractions are used to translate high level representations of virtualization programmed via the vmiop and gpu-mgr into practical, architecture specific instructions.

====== AMAPLibrary ======
The AMAPLibrary acts as a device abstraction framework for GPU driver software to program using high level representations of MMU configuration.

The AMAPLibrary translates high level representations of GPU virtualization into graphics architecture-specific logic contained within architecture HALs (Hardware Abstraction Layers).

====== Architecture HALs ======
GPU [https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware Abstraction Layers (HALs)] contain logic specific to graphics architectures, for instance the precise method by which the GPU driver may interact with the Falcon to provision MMU protected memory.

====== DMA from Falcon / NvRISC-V to Frame Buffer Interface (FBIF) ======
The Falcon (FAst Logic CONtroller) / NvRISC-V embedded controller emits DMAs to the Frame Buffer Interface (FBIF) in order to interact with the device's GMMU controller.

User programs and VMs have their memory translated through the GMMU.

Once created memory translations within a virtual machine's IO Virtual Address (IOVA) will be protected by the device's GMMU controller.

===== Translation Tables =====

====== vmiop_gva ======

=== Display Surface Virtualization ===

== amdgpu ==

== References (Talks & Reading Material) ==
#[https://01.org/linuxgraphics/documentation/hardware-specification-prms Intel Graphics Programmer's Reference Manuals (PRM)]
#[https://bwidawsk.net/blog/2013/1/i915-hardware-contexts-and-some-bits-about-batchbuffers/ i915: Hardware Contexts (and some bits about batchbuffers)]
#[https://bwidawsk.net/blog/2014/6/the-global-gtt-part-1/ i915: The Global GTT Part 1]
#[https://bwidawsk.net/blog/2014/6/aliasing-ppgtt-part-2/ i915: Aliasing PPGTT Part 2]
#[https://bwidawsk.net/blog/2014/7/true-ppgtt-part-3/ i915: True PPGTT Part 3]
#[https://bwidawsk.net/blog/2014/7/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/ i915: Future PPGTT Part 4 (Dynamic page table allocations, 64 bit address space, GPU "mirroring", and yeah, something about relocs too)]
#[https://igor-blue.github.io/2021/02/10/graphics-part1.html i915: Security of the Intel Graphics Stack - Part 1 - Introduction]
#[https://igor-blue.github.io/2021/02/24/graphics-part2.html i915: Security of the Intel Graphics Stack - Part 2 - FW <-> GuC]
#[https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf i915: An Introduction to Intel GVT-g (with new architecture)]
#[https://lwn.net/Articles/758903/ lwn.net: Add udmabuf misc device]
#[https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf Nvidia RISC-v Story]
#[https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IOMMU Introduction]
#[https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware and Compute Abstraction Layers For Accelerated Computing Using Graphics Hardware and Conventional CPUs]
#[https://envytools.readthedocs.io/en/latest/hw/intro.html nVidia GPU Introduction (envytools)]
#[https://on-demand.gputechconf.com/gtc/2014/presentations/S4725-hi-perf-graphics-nvidia-grid-virtual-gpus.pdf Delivering High Performance Remote Graphics With Nvidia GRID Virtual GPU]
#[https://nehajoshi.dev/post/nvidia_mig_feature/ NVIDIA Multi-Instance GPU]
#[https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol05-memory_views.pdf <nowiki>i915: Kaby Lake Intel Graphics Programmer's Reference Manual [Volume 5: Memory Views]</nowiki>]
#[https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipeline Lifecycle of a Triangle - Nvidia's logical pipeline]
#[https://docs.nvidia.com/cuda/parallel-thread-execution/ Nvidia Parallel Thread Execution (PTX)]

GPU Software Bill Of Materials (SBOM)

2023-06-13T14:58:30Z

Arthur: Updated Nvidia VGX vGPU version to 15.2 in GPU SBOM.

This page will keep a running list of components used to achieve GPU Virtualization.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>
{| class="wikitable"
|+Component Table
!Vendor
!Component
!Description
!Version
!OSS or Blob
!Filesize
!Vendor Docs
!Release Date
!Interfaces / APIs
!Notes
|-
|Microsoft
|Indirect Display Driver (IDD)
|Driver
|
|[https://github.com/Microsoft/Windows-driver-samples/tree/main/video/IndirectDisplay OSS]
|
|[https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview Indirect Display Driver Overview]
|
|[https://learn.microsoft.com/en-us/windows-hardware/drivers/wdf/overview-of-the-umdf UMDF], [https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/ KMDF]
|The Indirect Display Driver (IDD) enables GPUs to render graphics at arbitrary resolutions without a physical display connected on Windows OS systems.
|-
| rowspan="2" |RedHat
|vfio_pci
|Driver
|6.2-rc1
|[https://github.com/torvalds/linux/tree/master/drivers/vfio/pci OSS]
|in-kernel
|[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/virtualization/chap-virtualization-pci_passthrough RedHat], [https://docs.kernel.org/driver-api/vfio.html kernel.org]
|2022.12.15
|[https://open-iov.org/index.php/Virtual_I/O_Internals#Instruction_Execution irqfd], [https://open-iov.org/index.php/Virtual_I/O_Internals#Instruction_Execution ioeventfd], [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL]
|Reference VFIO Stub driver used for discrete assignment of IO, and assignment of some SR-IOV-backed vGPU devices into virtual machines. This driver is commonly replaced with a vendor built VFIO interface with differing memory management and/or page pinning mechanisms which are specific to the vGPU software internals.
|-
|Mediated Core (mdev.ko)
|Driver
|6.2-rc1
|OSS
|0.044 MB
|[https://docs.kernel.org/driver-api/vfio-mediated-device.html kernel.org]
|2016.xx.xx
|[https://open-iov.org/index.php/Virtual_I/O_Internals#Instruction_Execution irqfd], [https://open-iov.org/index.php/Virtual_I/O_Internals#Instruction_Execution ioeventfd], [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL], [https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_DMA_Translations Type 1 IOMMU]
|The Mediated Core driver provides a common interface for mediated device management that can be used by drivers of different devices. It is an IOMMU/device-agnostic framework for exposing direct device access to user space in a secure, IOMMU-protected environment (based on VFIO).
|-
| rowspan="2" |Arc Compute
|gvm-guest
| rowspan="2" |Daemon
|0.1.0
|[https://github.com/Open-IOV/GVM-guest OSS]
|
|[https://docs.linux-gvm.org/gvm-guest docs.linux-gvm.org/gvm-guest]
|2023.02.08
|[https://fedoraproject.org/wiki/Features/VirtioSerial Virtio-Serial], CLI
|Handles IO to and from guests and the host using Virtio-Serial to handle multiple different guest modules.
|-
|gvm-cli
|1.0
|[https://github.com/Open-IOV/GVM-user OSS]
|0.084 MB
|[https://docs.linux-gvm.org/gvm-user docs.linux-gvm.org/gvm-user]
|2023.01.06
|CLI, [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL], [https://open-iov.org/index.php/GPU_Driver_Internals#RM_API RMAPI], [https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_Core Mediated Core]
|Configures the the [https://open-iov.org/index.php/GPU_Driver_Internals#nvidia-vgpu-mgr nvidia-vgpu-mgr] process.
|-
| rowspan="4" |Intel
|i915 SR-IOV
|Driver
| rowspan="3" |[https://github.com/intel/linux-intel-lts/tree/5.15/ADL-linux-ER 5.15]
|OSS
|in-kernel
|
|
|
|Intel's open source GPU driver for [https://open-iov.org/index.php/GPU_Firmware#GuC GuC]-equipped graphics accelerators.
|-
|[https://open-iov.org/index.php/GPU_Firmware#GuC GuC] μOS
| rowspan="2" |Firmware
| rowspan="2" |Blob
|
|
|
|IOMMU Interrupts, Power Management Interrupts, [https://open-iov.org/index.php/GPU_Driver_Internals#GTT_(Graphics_Translation_Table) GTT]
|Handles scheduling, and power management.
|-
|HuC
|
|
|
|[https://open-iov.org/index.php/GPU_Driver_Internals#GTT_(Graphics_Translation_Table) GTT]
|Handles video encoding/decoding.
|-
|[https://open-iov.org/index.php/GPU_Driver_Internals#Display_Surface_Virtualization_2 Display Virtualization for Windows OS]
|Driver
|[https://github.com/intel/Display-Virtualization-for-Windows-OS/releases/ 791]
|[https://github.com/intel/Display-Virtualization-for-Windows-OS OSS]
|
|[https://github.com/intel/Display-Virtualization-for-Windows-OS/blob/main/Readme.txt github.com/intel/Display-Virtualization-for-Windows-OS/blob/main/Readme.txt]
|
|[https://github.com/ikwzm/udmabuf udmabuf], [https://learn.microsoft.com/en-us/windows-hardware/drivers/wdf/overview-of-the-umdf UMDF], [https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/ KMDF]
|Intel's 'Display Virtualization for Windows OS' provides a virtual pixel surface used for hardware graphics rendering using Microsoft's open source 'Indirect Display Driver (IDD)'. Display virtualization for Windows OS makes use of display memory sharing primitives provided in-driver by the i915 host.
|-
| rowspan="7" |Nvidia
|[[OpenRM]]
|Driver
| rowspan="2" |525.85.12
|OSS
|
|[https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/README.md github.com/NVIDIA/open-gpu-kernel-modules/blob/main/README.md]
|2023.01.31
|
|Nvidia's open source GPU driver for [https://open-iov.org/index.php/GPU_Firmware#GSP GSP]-equipped graphics accelerators.
|-
|[https://open-iov.org/index.php/GPU_Firmware#GSP GSP] RM (uproc)
| rowspan="2" |Firmware
| rowspan="2" |Blob
|
|
|2023.01.31
|RPC
|Embedded firmware based on [https://lwn.net/Articles/637658/ LibOS] containing the [https://open-iov.org/index.php/GPU_Driver_Internals#RM_Core RM Core].
|-
|Falcon/NvRISC (uproc)
|
|
|
|
|FBIF (Frame Buffer Interface) / GMMU
|Embedded firmware which handles many aspects of the device including configuring the GPU's GMMU controller.
|-
|nvidia-vgpud
| rowspan="2" |Daemon
| rowspan="4" |[https://docs.nvidia.com/grid/15.0/whats-new-vgpu/index.html v15.2]
| rowspan="4" |Blob
|0.108 MB
| rowspan="4" |[https://docs.nvidia.com/grid/index.html docs.nvidia.com/grid]
|2023.01.xx
|CLI, [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL], [https://open-iov.org/index.php/GPU_Driver_Internals#RM_API RMAPI], [https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_Core Mediated Core]
|Configures the the [https://open-iov.org/index.php/GPU_Driver_Internals#nvidia-vgpu-mgr nvidia-vgpu-mgr] process.
|-
|nvidia-vgpu-mgr
|0.132 MB
|2023.01.xx
|[https://open-iov.org/index.php/GPU_Driver_Internals#vmiop vmiop], [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL], [https://open-iov.org/index.php/GPU_Driver_Internals#RM_API RMAPI], [https://open-iov.org/index.php/GPU_Driver_Internals#Guest_kernel_(nvidia.ko) vRPC], pRPC, [https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_Core Mediated Core]
|Handles IO to and from guest RM, host RM, and hardware.
|-
|libnvidia-vgpu.so
|Library
|3.1 MB
|2023.01.xx
|[https://open-iov.org/index.php/GPU_Driver_Internals#vmiop vmiop], [https://open-iov.org/index.php/GPU_Driver_Internals#RM_API RMAPI], [https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_Core Mediated Core]
|Handles IO to and from guest RM, host RM, and hardware. Contains circular dependancies with [https://open-iov.org/index.php/GPU_Driver_Internals#nvidia-vgpu-mgr nvidia-vgpu-mgr].
|-
|nvidia-vgpu-vfio.ko
|Driver
|0.108 MB
|2023.01.xx
|[https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_DMA_Translations Type 1 IOMMU], [https://open-iov.org/index.php/Virtual_I/O_Internals#Instruction_Execution irqfd, ioeventfd], [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/ioctl.rst IOCTL], [https://open-iov.org/index.php/Virtual_I/O_Internals#Mediated_Core Mediated Core]
|Handles incremental memory mapping (non-page pinning per the standard vfio-pci driver).
|}

== More Information ==

# [https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf Nvidia RISC-V Story]
# [https://linux-gvm.org linux-gvm.org]
#[https://01.org/linuxgraphics/documentation/hardware-specification-prms Intel Graphics Programmer's Reference Manuals (PRM)]
#[https://bwidawsk.net/blog/2013/1/i915-hardware-contexts-and-some-bits-about-batchbuffers/ i915: Hardware Contexts (and some bits about batchbuffers)]
#[https://bwidawsk.net/blog/2014/6/the-global-gtt-part-1/ i915: The Global GTT Part 1]
#[https://bwidawsk.net/blog/2014/6/aliasing-ppgtt-part-2/ i915: Aliasing PPGTT Part 2]
#[https://bwidawsk.net/blog/2014/7/true-ppgtt-part-3/ i915: True PPGTT Part 3]
#[https://bwidawsk.net/blog/2014/7/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/ i915: Future PPGTT Part 4 (Dynamic page table allocations, 64 bit address space, GPU "mirroring", and yeah, something about relocs too)]
#[https://igor-blue.github.io/2021/02/10/graphics-part1.html i915: Security of the Intel Graphics Stack - Part 1 - Introduction]
#[https://igor-blue.github.io/2021/02/24/graphics-part2.html i915: Security of the Intel Graphics Stack - Part 2 - FW <-> GuC]
#[https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf i915: An Introduction to Intel GVT-g (with new architecture)]

Virtual I/O Internals

2023-06-12T17:02:36Z

Arthur:

GPU Driver Internals

2023-05-25T20:33:37Z

Arthur: /* References (Talks & Reading Material) */

This page will detail the internals of various GPU drivers for use with I/O Virtualization.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

== Article Structure ==
This article will aim to provide information about the following details of GPU drivers:

=== High Level Architecture ===
This section will detail high level architectures of each driver.

=== Initialization ===
This section will cover how the GPU's embedded components, the GPU driver, and virtualization functions are initialized.

=== Scheduling ===
This section will detail how the device schedules instructions for execution.

This will attempt to provide a comprehensive view of scheduling from virtualized processes down to execution within the device.

==== In-VM Scheduling ====
The In-VM Scheduling section will detail how instructions scheduled within the virtual machine's GPU device driver.

==== Between-VM Scheduling ====
The Between-VM Scheduling section will detail how the host kernel module and/or virtual GPU helper functions handle scheduling / context swaps between virtual machines on a computer system.

==== Firmware Scheduling (if applicable) ====
The Firmware Scheduling section will cover the GPU's internal scheduling model if a deferred execution pathway is used (like i915's GuC or OpenRM's GSP).

In the case an intermediate scheduling microcontroller is not used this section may be less applicable (ie: vExeclist scheduling under Intel vGPUs).

=== Memory Management ===
The Memory Management section will cover how the GPU driver manages memory for virtual GPUs and host processes.

This will aim detail paging abstractions used for global memory translation, and per-process memory translation.

=== Display Surface Virtualization ===
The Display Surface Virtualization section will detail how virtual displays are provided to guests and/or graphics rendering buffers (pixel surfaces) are shared from guest to host if such functions are provided via the driver.

==== Graphics Buffer Sharing ====
This section will cover methods provided by the GPU driver suitable for high performance graphics sharing.

== i915 ==

=== Knowledge Resources Used ===
This section is supported by significant contributions in open source by Zhi Wang, Ben Widawsky, and Igor Bogdanov.

See references 2, 3, 4, 5, 6, 7, 8, and 9 in the [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) References (Talks & Reading Material)] section.

=== High Level Architecture ===

==== i915 Clients ====
Processes which make use of the Intel i915 driver receive an i915 Client ID.

=== Initialization ===
During initialization of the i915 driver the GuC binary blob is offloaded into the Graphics Translation Table (GTT). This allows the GuC to read GTT-loaded binary blob from shared framebuffer memory so that it may boot.

=== Scheduling ===
[[File:Figure 0- i915 vGPU Scheduling.png|alt=Figure 0: i915 vGPU Scheduling|thumb|'''Figure 0:''' i915 vGPU high-level Scheduling. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]
There are several abstractions for GPU virtual machine scheduling. Those are as follows:
* Guest kernel (i915.ko)
* Host kernel (i915.ko)
* Device Firmware (GuC)

===== In-VM Scheduling =====
===== Guest kernel (i915.ko) =====

====== vExeclist ======
The vExeclist is a method to submit commands directly to the GPU without the use of an intermediate microcontroller.

====== vGuC ======
vGuC is a command submission interface used to process commands to the Intel [https://open-iov.org/index.php/GPU_Firmware#GuC Graphics Microcontroller (GuC)].

==== Between-VM Scheduling ====
[[File:I915 Scheduling Events and Requests.png|alt=Figure 1: i915 vGPU Scheduling Requests & Events|thumb|'''Figure 1:''' i915 vGPU Scheduling Requests & Events. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]

===== Host kernel (i915.ko) =====
Submitting commands to the GPU under i915 can take several paths. One pathway makes use of direct command submission without an intermediate micro-controller whereas the other uses an intermediate micro-controller. The intermediate micro-controller approach increases the amount of binary-blob code used and abstracts the kernel module from the device's internal scheduling model.

====== Execlist ======
Execlist executes commands synchronously on the device without an intermediate microcontroller. This is the preferred method of executing commands by some driver developers because of it's stability and transparency under current i915 development.

====== GuC ======
GuC provides an execution pathway with an intermediate microcontroller providing a scheduling abstraction for Intel's preferred internal scheduling model.
[[File:Figure 2- Intel vGPU Scheduler.png|alt=Figure 2: Intel vGPU Scheduler request flow.|thumb|'''Figure 2:''' i915 vGPU Scheduler request flow. See slides from: [https://open-iov.org/index.php/GPU_Driver_Internals#References_(Talks_&_Reading_Material) <nowiki>[9]</nowiki>]]]

===== Device Firmware (GuC) =====

=== Memory Management ===

===== Translation Tables =====

====== GTT (Graphics Translation Table) ======
GPU Memory on-device is a part of a GTT or Graphics Translation Table. This table stores information globally for all graphics processes within the system. Some processes access the Global Graphics Translation Table (GGTT) such as [[wikipedia:Direct_Rendering_Infrastructure|DRI]] while other's receive a Per Process Graphics Translation Table (PPGTT) buffer based on their i915 Client ID.

====== GGTT (Global Graphics Translation Table) ======

====== PPGTT (Per Process Graphics Translation Table) ======
Process-specific memory buffers are stored inside a Per Process Graphics Translation Table or PPGTT. This is a [https://open-iov.org/index.php/Virtual_IO_Internals#VRAM_Isolation_(GPU_GMMU) GPU MMU] translated subregion or IOVA of global GPU memory specific to a GPU process's client ID.

====== Aliasing PPGTT ======
Aliasing PPGTT (Per Process Graphics Translation Table) refers to partially separated GPU resources (process context).

====== Real PPGTT ======
Real PPGTT (Per Process Graphics Translation Table) refers to fully separated GPU resources (process context).

=== Display Surface Virtualization ===
The Intel vGPU makes use of two modes for Display Surface Virtualization.

* VirtIO-GPU (Linux)
* Indirect Display Driver (Windows)

==== VirtIO-GPU ====

==== IDD (Indirect Display Driver) ====
The IDD ([https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview Indirect Display Driver]) provides virtual display functions with arbitrary resolutions to any rendering device virtual or physical.

Intel's IDD can be found in their [https://github.com/intel/Display-Virtualization-for-Windows-OS/ Display Virtualization for Windows OS] repository.

==== Graphics Buffer Sharing ====
Intel's i915 driver provides functionality to directly map display memory from a [https://open-iov.org/index.php/Merged_Drivers guest vGPU Virtual Function (either SR-IOV or VFIO-Mdev) into the host GPU's Physical Function] without slow memory copies or graphics compression. For users running GPU virtualization on their local device this results in a significant performance uplift compared to traditional graphics sharing functionality built for remote access use-cases such as VDI.

Intel's [https://github.com/intel/Display-Virtualization-for-Windows-OS 0copy display virtualization tools] are simple to implement as sharing does not rely upon an added [https://www.qemu.org/docs/master/system/devices/ivshmem.html IVSHMEM (Inter-VM Shared Memory device)] - rather the host directly maps the guest's display memory via [https://lwn.net/Articles/758903/ udmabuf] as the buffer sharing functions are provided within the i915 open source driver.

== OpenRM ==
[[File:Figure 3- GPU BAR to Timeshared Syhededuling via Channel IO.png|alt=Figure 3: GPU BAR to Timeshared Syhededuling via Channel IO|thumb|'''Figure 3:''' GPU BAR to Timeshared Scheduling via Channel IO.]]

=== High Level Architecture ===
The Open Resource Manager (RM driver) makes use of a highly object oriented paradigm comprised of multiple "engines" which act as micro-services for servicing driver requests.

The Open Resource Manager driver (also known as [https://open-iov.org/index.php/OpenRM OpenRM]) refers to Nvidia's [https://github.com/NVIDIA/open-gpu-kernel-modules open-kernel-modules].

Broadly speaking the OpenRM driver consists of two parts.

* The Platform RM (OpenRM)
* The Firmware RM (GSP RM / RM Core)
The Platform RM is loaded into the Linux Kernel as nvidia.ko. This module communicates with the GSP RM for via [[wikipedia:Remote_procedure_call|Remote Procedure Calls (RPCs)]] to communicate with engines from the RM Core.

===== RM Clients =====
Processes (local, remote, or virtualized) which make use of the RM driver receive an RM Client ID.

==== RM Server ====
The RM Server (or Resource Server) tracks RM Clients as well as the hardware and software resources they control, allocate, and free.

==== RM API ====
API to control the Resource Manager Server.

==== RM Core ====
Core functions of the RM driver controlling resource locking, mapping, unmapping, control calls, constructors, and deconstructors. Under OpenRM the RM Core runs within the GPU System Processor (GSP micro-controller) while in pre-open source versions of the Resource Manager the RM Core ran within the nvidia.ko kernel module

=== Initialization ===
During bring up of the hardware several binary blobs are loaded from embedded [[wikipedia:Boot_ROM|Boot ROM]] memory to bootstrap embedded controller bring up from which point additional software is loaded from onboard [[wikipedia:Serial_Peripheral_Interface|SPI]] flash memory.

Software loaded from SPI flash is necessary for the full initialization of the Falcon/NvRISC processor as well as a cached version of the software necessary to run the GPU System Processor (GSP).

Once the platform is posted it is ready to communicate with the host platform's RM driver. The OpenRM driver offloads a binary blob containing the RM Core to the [https://open-iov.org/index.php/GPU_Firmware#GSP GPU System Processor (GSP)] which is likely to contain a more recent version than the cached version contained in on-board SPI flash.

=== Scheduling ===
There are several abstractions for GPU virtual machine scheduling. Those are as follows:

* Guest kernel (nvidia.ko)
* Host kernel (nvidia.ko)
*Host usermode (gpu-mgr / libnvidiavgpu.so)
* Device Firmware (GSP)

==== Command Submission ====

===== Runlist =====

==== In-VM Scheduling ====
Virtual machines contain their own GPU scheduling within the Nvidia kernel module in the guest OS.

===== Guest kernel (nvidia.ko) =====
Within a virtual machine running the Nvidia driver messages to the GPU are first sent to the guest nvidia.ko kernel module.

The guest then determines whether a vRPC (virtual Remote Procedure Call) or a pRPC (physical Remote Procedure Call) should be sent. Both pRPCs and vRPCs are sent through the host RM driver (Resource Manager).<blockquote>''Note: Unclear on execution pathway for pRPCs vs vRPCs. pRPCs may go directly to device.''</blockquote>

==== Between-VM Scheduling ====
In addition to scheduling which occurs within the virtual machine the Resource Manager driver also schedules messages to the GPU between GPU-accelerated virtual machines and host processes.

===== Host kernel (nvidia.ko) =====
Messages sent by the guest (via vRPC or pRPC) are received by the host Nvidia.ko driver.

Nvidia.ko contains a virtual GPU state machine which contains status information for the virtual GPU.

Nvidia.ko also contains a virtual GPU kernel scheduler which interacts with virtual GPU objects.

Nvidia.ko also contains an RM Call scheduler which schedules calls on an RM class.

The Nividia.ko kernel module exits to userspace to execute the nvidia-vgpu-mgr and VMIOP (Virtual Machine Input Output Plugin).

===== Host usermode (nvidia-vgpu-mgr / libnvidia-vgpu.so) =====
After exiting to userspace a daemon process (the nvidia-vgpu-mgr, and it's library libnvidia-vgpu.so) are executed to schedule VM-exits described below.

===== nvidia-vgpu-mgr =====
The nvidia-vgpu-mgr is a process which provides the spawning of virtual GPU stubs and population with capability information.

This daemon process interacts with the libnvidia-vgpu.so, nvidia.ko, and nvidia-vgpu-vfio.ko components.

This process and the libnvidia-vgpu.so contain, and execute the VMIOP.

This process (and the VMIOP) schedules RPCs sent by the guest, receives VFIO BAR-exits, and relays requests to allocate, deallocate, pin, unpin, map, unmap to the [https://open-iov.org/index.php/GPU_Driver_Internals#RM_Core RM Core].

====== vmiop ======
VMIOP (Virtual Machine Input Output Plugin) handles presenting virtualized functionality into the guest.

The Virtual Machine Input Output Plugin software handles virtual displays, compute API offload, and most importantly [https://open-iov.org/index.php/Virtual_I/O_Internals#VFIO_Quirks_(region_traps) BAR (Base Address Register) quirks].

The VMIOP is an [[wikipedia:Software_development_kit|SDK (Software Development Kit)]] provided in binary format split between the libnvidiavgpu.so and nvidia-vgpu-mgr which provides userland helper functions for GPU virtualization.

Messages to the VMIOP are scheduled by Linux kernel [https://www.learnlinux.org.za/courses/build/internals/ch07s02.html niceness] (scheduling abstraction).

===== Device Firmware (GSP) =====
Messages received by the host RM driver (Resource Manager) are then scheduled by the RM Core contained within the GSP (GPU System Processor). The GSP handles the device's internal scheduling model.

=== Memory Management ===
Managing virtual machine memory is very important to the security of virtualization.

This section will cover the method by which secure memory "enclaves" or [https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IO Virtual Addresses (IOVAs)] may be provisioned and separation enforced by hardware constructs within the GPU.

===== Programming the MMU =====
In order to hardware enforce separation between memory allocated to Virtual Machines (VMs) virtualization software must program the GPU's MMU (GMMU controller) to create IO Virtual Addresses (IOVAs).

In order to create such configurations several abstractions are used to translate high level representations of virtualization programmed via the vmiop and gpu-mgr into practical, architecture specific instructions.

====== AMAPLibrary ======
The AMAPLibrary acts as a device abstraction framework for GPU driver software to program using high level representations of MMU configuration.

The AMAPLibrary translates high level representations of GPU virtualization into graphics architecture-specific logic contained within architecture HALs (Hardware Abstraction Layers).

====== Architecture HALs ======
GPU [https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware Abstraction Layers (HALs)] contain logic specific to graphics architectures, for instance the precise method by which the GPU driver may interact with the Falcon to provision MMU protected memory.

====== DMA from Falcon / NvRISC-V to Frame Buffer Interface (FBIF) ======
The Falcon (FAst Logic CONtroller) / NvRISC-V embedded controller emits DMAs to the Frame Buffer Interface (FBIF) in order to interact with the device's GMMU controller.

User programs and VMs have their memory translated through the GMMU.

Once created memory translations within a virtual machine's IO Virtual Address (IOVA) will be protected by the device's GMMU controller.

===== Translation Tables =====

====== vmiop_gva ======

=== Display Surface Virtualization ===

== amdgpu ==

== References (Talks & Reading Material) ==
#[https://01.org/linuxgraphics/documentation/hardware-specification-prms Intel Graphics Programmer's Reference Manuals (PRM)]
#[https://bwidawsk.net/blog/2013/1/i915-hardware-contexts-and-some-bits-about-batchbuffers/ i915: Hardware Contexts (and some bits about batchbuffers)]
#[https://bwidawsk.net/blog/2014/6/the-global-gtt-part-1/ i915: The Global GTT Part 1]
#[https://bwidawsk.net/blog/2014/6/aliasing-ppgtt-part-2/ i915: Aliasing PPGTT Part 2]
#[https://bwidawsk.net/blog/2014/7/true-ppgtt-part-3/ i915: True PPGTT Part 3]
#[https://bwidawsk.net/blog/2014/7/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/ i915: Future PPGTT Part 4 (Dynamic page table allocations, 64 bit address space, GPU "mirroring", and yeah, something about relocs too)]
#[https://igor-blue.github.io/2021/02/10/graphics-part1.html i915: Security of the Intel Graphics Stack - Part 1 - Introduction]
#[https://igor-blue.github.io/2021/02/24/graphics-part2.html i915: Security of the Intel Graphics Stack - Part 2 - FW <-> GuC]
#[https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf i915: An Introduction to Intel GVT-g (with new architecture)]
#[https://lwn.net/Articles/758903/ lwn.net: Add udmabuf misc device]
#[https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf Nvidia RISC-v Story]
#[https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/08/04/iommu-introduction IOMMU Introduction]
#[https://archive.ll.mit.edu/HPEC/agendas/proc07/Day3/10_Hensley_Abstract.pdf Hardware and Compute Abstraction Layers For Accelerated Computing Using Graphics Hardware and Conventional CPUs]
#[https://envytools.readthedocs.io/en/latest/hw/intro.html nVidia GPU Introduction (envytools)]
#[https://on-demand.gputechconf.com/gtc/2014/presentations/S4725-hi-perf-graphics-nvidia-grid-virtual-gpus.pdf Delivering High Performance Remote Graphics With Nvidia GRID Virtual GPU]
#[https://nehajoshi.dev/post/nvidia_mig_feature/ NVIDIA Multi-Instance GPU]
#[https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol05-memory_views.pdf <nowiki>i915: Kaby Lake Intel Graphics Programmer's Reference Manual [Volume 5: Memory Views]</nowiki>]
#[https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipeline Lifecycle of a Triangle - Nvidia's logical pipeline]
#[https://docs.nvidia.com/cuda/parallel-thread-execution/ Nvidia Parallel Thread Execution (PTX)]

Virtual I/O Internals

2023-05-22T21:20:20Z

Arthur: /* References (Talks & Reading Material) */

Virtual I/O Internals

2023-05-22T16:28:00Z

Arthur: /* References (Talks & Reading Material) */

Virtual I/O Internals

2023-05-05T20:54:27Z

Arthur: /* References (Talks & Reading Material) */

Articles

2023-05-02T20:54:52Z

Arthur: /* GVM Integration Documents */

This page indexes the articles contained within Open-IOV.

If you're new to GPU Virtualization start by reading the '''[[Introduction]]''' article.
=== Start Here ===
[[Introduction]]

[https://open-iov.org/index.php/Open-IOV:About About Open-IOV (CC-BY-4.0)]

===Abstract===
[[Introductory Concepts & Definitions|Glossary]]

[[Virtualization Fundamentals]]

[[Merged Drivers]]

=== Design Documents ===
[[Virtual IO Internals|Virtual I/O Internals]]

[[GPU Driver Internals]]
=== Driver Integration Documents ===
[https://open-iov.org/index.php/OpenRM Nvidia]

[[Intel SR-IOV APIs|Intel]]

[https://open-iov.org/index.php/AMDGPU AMD]

===Projects===
[https://open-iov.org/index.php/LibVF.IO LibVF.IO]

[[Hyperborea]]

[https://open-iov.org/index.php/LIME_Is_Mediated_Emulation LIME Is Mediated Emulation]

[https://open-iov.org/index.php/Looking_Glass_KVMFR Looking Glass]

[https://openxt.atlassian.net/wiki/spaces/OD/pages/10747915/What+is+OpenXT OpenXT]

[https://gitlab.com/vglass OpenXT: vGlass]

[https://github.com/OpenXT/surfman OpenXT: Surfman (legacy DRM)]

[https://www.bromium.com/opensource/ Bromium/uXen]

[https://xenproject.org/help/documentation/ Xen Project]

[https://www.qubes-os.org/doc/ Qubes OS]

[https://projectacrn.github.io/2.1/tutorials/using_celadon_as_uos.html Intel Celadon]

[https://open-iov.org/index.php/VGPU_Unlock vGPU_Unlock]

[[LibRM]]
=== Device Support===
[[GPU Support]]

[[CPU Support]]

[[GPU Firmware]]

=== Software Support ===
[https://open-iov.org/index.php/Hypervisor_Support Hypervisor Support]

[[GPU Software Bill Of Materials (SBOM)]]

=== API Documentation ===

==== Kernel APIs ====
[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api Kernel.org Driver Core Documentation]

[https://docs.microsoft.com/en-us/windows-hardware/drivers/display/iommu-based-gpu-isolation NT Kernel (Windows) IOMMU-based GPU Isolation]

[https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/vfio.rst VFIO] - [https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h vfio.h] - [https://elixir.bootlin.com/linux/latest/source/include/linux/mdev.h mdev.h]

[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 VFIO Mediated Device]
==== Driver APIs ====
[https://projectacrn.github.io/2.1/api/GVT-g_api.html i915 GVT-g API]

[https://nouveau.freedesktop.org/Development.html Nouveau Tools & API]
==== Sample Code ====
GPLv2 sources mirrored from [https://elixir.bootlin.com/linux/latest/source/samples/vfio-mdev/ elixir.bootlin.com] with [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/Makefile simple makefile changes].

[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mtty.c mtty.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy.c mdpy.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-fb.c mdpy-fb.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-defs.h mdpy-defs.h] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mbochs.c mbochs.c]

==== Virtualization APIs ====
[https://open-iov.org/index.php/Mdev-GPU#Mdev-CLI GVM/Mdev-CLI API]

[https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html QEMU Machine Protocol (QMP) Reference Manual]

[https://projectacrn.github.io/2.1/developer-guides/hld/ivshmem-hld.html Inter-VM Shared Memory (IVSHMEM)]
===User Guides===
[https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/ LibVF.IO Setup Guide]

[https://looking-glass.io/docs/stable/ Looking Glass Quickstart Guide]

[https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide Intel GVT-g Setup Guide]

[https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/tree/master/docs AMD GPU-IOV Module Docs]

[https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF PCI passthrough via OVMF]

[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/index RedHat Virtualization Guide]

=== Developer Guides ===
[https://rayanfam.com/tags/hypervisor/ Hypervisor From Scratch]

[https://lwn.net/Kernel/LDD3/ Linux Device Drivers (3rd Edition)]

[https://dri.freedesktop.org/docs/drm/gpu/ GPU Driver Developer's Guide]

[https://dri.freedesktop.org/docs/drm/PCI/pci.html# How To Write PCI Drivers]

[https://doc.dpdk.org/guides-16.04/prog_guide/ivshmem_lib.html Data Plane Development Kit: IVSHMEM Programming Guide]

=== Specifications ===
[https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs Hyper-V Hypervisor Top Level Functional Specification (TLFS)]

=== Communities & Mailing Lists ===
[https://discord.gg/Rb9K9DYxKK Open-IOV Discord]

[https://lists.freedesktop.org/mailman/listinfo/intel-gfx Intel-gfx Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/nouveau Nouveau Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/amd-gfx AMD-gfx Mailing List]

[https://listman.redhat.com/mailman/listinfo/vfio-users VFIO-users Mailing List]

[https://forum.level1techs.com/c/software/vfio/132 <nowiki>Level1Techs Forum [VFIO Topic]</nowiki>]

[https://old.reddit.com/r/VFIO/ VFIO Subreddit]

GPU Firmware

2023-04-09T17:13:39Z

Arthur: /* Firmware Images */

GPUs have become highly complex systems containing a number of different embedded controllers. This page will attempt to document embedded GPU firmware and support for IO virtualization through various firmware functions.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

== Intel ==

=== Firmware Images ===
This section will cover firmware images used in Intel GPUs.[[File:Screen Shot 2022-11-18 at 4.14.32 PM.png|alt=Figure 1: The FSP Binary Layout from Intel® Firmware Support Package External Architecture Specification.|thumb|Figure 1: The FSP Binary Layout from Intel® Firmware Support Package External Architecture Specification. [https://cdrdv2.intel.com/v1/dl/getContent/736809 Source]]]

==== Intel Firmware Support Package (FSP) ====
Much like CPUs Intel's GPUs also contain a Firmware Support Package (FSP).

The Coreboot project provides public domain information on the FSP [https://doc.coreboot.org/soc/intel/fsp/index.html here].

===== FSP Configuration =====
In the context of GPUs the FSP configures several functions of the device.

Those functions are as follows:
{| class="wikitable"
|+GPU Firmware Support Package
!FSP Parameter
!Possible Values
|-
|GFSP Status
|0x00
|-
|FIVR SSC Value
|*.*%
|-
|FIVR RFI Value
|*.*MHz
|-
|GT Subsystem Vendor ID
|0x8086
|-
|GT Subsystem Device ID
|0x**
|-
|HDA Subsystem Vendor ID
|0x0000
|-
|HDA Subsystem Device ID
|0x0000
|-
|P2SB Enable
|Yes/No
|-
|LMEBAR
|Max
|-
|GTMMADDR Prefetch Capability
|Prefetch Enabled
|-
|[https://open-iov.org/index.php/Merged_Drivers Display Present]
|Enabled/Disabled
|-
|I2C For Third Party Devices
|Enabled/Disabled
|-
|I2C Device Address 1
|0x0000
|-
|I2C Device Address 2
|0x0000
|-
|I2C Bus Speed
|Standard mode (0 to 100Kbps)
|}

====== Editing FSP Configuration ======
The FSP configuration editor can be downloaded [https://github.com/tianocore/edk2/tree/master/IntelFsp2Pkg/Tools/ConfigEditor here] and it's user manual is available [https://github.com/tianocore/edk2/blob/master/IntelFsp2Pkg/Tools/UserManuals/ConfigEditorUserManual.md here].

===== FSP Binary Format =====
The FSP's binary layout is detailed within the [https://cdrdv2.intel.com/v1/dl/getContent/736809 Intel® FSP External Architecture Specification v2.4] on page 14.

==== Known Firmware Package Variations ====
Some firmware packages may include an **End of Manufacturing Flash Protection Mode** status of Protected or Unprotected.
[[File:Figure 2- Firmware status information for an Intel DG2 device..png|alt=Figure 2: Firmware status information for an Intel DG2 device.|thumb|Figure 2: Firmware status information for an Intel DG2 device.]]
Similar [https://eclypsium.com/2022/09/19/firmware-security-realizations-part-3-spi-write-protections/ SPI Write Protection] functionality is made available through Intel CPUs under [[wikipedia:System_Management_Mode|System Management Mode (SMM)]].

=== Embedded Controllers ===
This section will cover firmware images as they apply to various embedded controllers within the Intel GPU.

==== GuC ====
The Graphics micro (µ) Controller (GuC) is an embedded controller contained within Intel's embedded and discrete graphics (DG*) series GPUs.

===== Hardware Architecture =====

''The following section is supported by [https://igor-blue.github.io/2021/02/10/graphics-part1.html igor-blue.github.io] (see reference [https://open-iov.org/index.php/GPU_Firmware#References_(Talks_&_Reading_Material) 1], [https://open-iov.org/index.php/GPU_Firmware#References_(Talks_&_Reading_Material) 2]):''

''"The GuC - an embedded [https://www.wikiwand.com/en/I486 i486] core that supports graphics scheduling, power management and firmware attestation."''
===== Software Architecture =====

''The following section is supported by [https://igor-blue.github.io/2021/02/10/graphics-part1.html igor-blue.github.io] (see reference [https://open-iov.org/index.php/GPU_Firmware#References_(Talks_&_Reading_Material) 1], [https://open-iov.org/index.php/GPU_Firmware#References_(Talks_&_Reading_Material) 2]):''

''"The μOS kernel runs in 32-bit protected mode, with no paging and old-style segments model (CS, DS, etc’). All code run in ring0. The OS handles HW/SW exceptions and crashes, and supplies debugging and logging services."''

''"It runs a single process - which initializes the system and then waits for interrupts/events in a loop."''

====== GuC Blob Checksum & Code Signing ======
''"The bootrom verifies the firmware with a digital signature using a SHA256 hash + PKCSv2.1 RSA signature, and if the test passes copies it to SRAM and starts executing."''
== Nvidia ==

=== Firmware Images ===
This section will cover firmware images used in Nvidia GPUs.

=== Embedded Controllers ===
This section will cover firmware images as they apply to various embedded controllers within the Nvidia GPU.

==== Falcon / NV-RISCV ====
The Fast Logic CONtroller (Falcon) and Nvidia RISC-V ([https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf NV-RISCV]) processors run the NvOS.

==== GSP ====
The GPU System Processor (GSP) is an embedded controller used for offload of the RM Core.

The GSP runs [https://lwn.net/Articles/637658/ Library Operating System (LibOS)].

===== GSP Initialization & Offload =====
The [https://open-iov.org/index.php/GPU_Driver_Internals#Initialization_3 GSP is initialized multiple times] during the system's bring up and runtime.

GSP offload may occur during:

* Hardware bring up when a cached version of the RM Core is loaded from SPI flash

* During host driver bring up when the RM Core is offloaded by the [https://open-iov.org/index.php/OpenRM OpenRM driver].
* During guest driver bring up when the RM Core is offloaded.

{| class="wikitable"
|+Possible GSP Offloads
!Load Source
!Payload
!Notes
|-
|[https://wiki.segger.com/SPI_Flash SPI Flash]
|Cached RM Core
|Used as a fallback in case of no rm offload.
|-
|OpenRM
|RM Core
|This is the RM Core which was traditionally contained in the proprietary RM driver.
|-
|VGX Guest
|Guest RM Core
|Future OpenRM guests may accomplish RM offload via GSP stubs (controlled via [https://github.com/NVIDIA/open-gpu-kernel-modules/blob/758b4ee8189c5198504cb1c3c5bc29027a9118a3/src/common/sdk/nvidia/inc/ctrl/ctrla081.h#L102 gspHeapSize]?).
|}

==== CMU ====

== AMD ==

== References (Talks & Reading Material) ==

# [https://igor-blue.github.io/2021/02/10/graphics-part1.html Security of the Intel Graphics Stack - Part 1 - Introduction]
# [https://igor-blue.github.io/2021/02/24/graphics-part2.html Security of the Intel Graphics Stack - Part 2 - FW <-> GuC]
#[https://eclypsium.com/2022/09/19/firmware-security-realizations-part-3-spi-write-protections/ Firmware Security Realizations Part 3: SPI Write Protections]
#[https://www.intel.com/content/www/us/en/intelligent-systems/intel-firmware-support-package/fsp-firmware-solutions-iot-video.html Intel® FSP: Firmware Solutions for the Internet of Things]

2023-03-28T19:50:00Z

Arthur: /* References (Talks & Reading Material) */

Glossary

2023-03-27T19:24:26Z

Arthur: /* Virtual GPU (vGPU) */

<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

== Definitions ==
This article will attempt to provide a glossary of common terms that are either related to/or often used in connection with virtualization for those who may need more context.

If you have suggestions on how to improve the glossary you can [https://openmdev.io/index.php/Main_Page#Getting_Started_Contributing_to_OpenMdev.io '''contribute'''] or join our [https://discord.gg/Rb9K9DYxKK '''community discussion thread'''] to make suggestions.

=== VFIO ===
<blockquote>Kernel.org defines VFIO or Virtual Function Input Output as '''"an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment." [https://www.kernel.org/doc/Documentation/vfio.txt <nowiki>[1]</nowiki>]'''</blockquote>
[[File:VFIO Diagram.png|thumb|A diagram depicting direct VFIO passthrough of a GPU adapter.]]

=== Mdev ===
<blockquote>Kernel.org defines Virtual Function I/O (VFIO) Mediated devices as: '''"an IOMMU/device-agnostic framework for exposing direct device access to user space in a secure, IOMMU-protected environment... mediated core driver provides a common interface for mediated device management that can be used by drivers of different devices." [https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 <nowiki>[2]</nowiki>]'''</blockquote>
[[File:VFIO-Mdev Diagram.png|thumb|A diagram depicting VFIO Mediated Device interacting with two vGPU guests.]]

=== Virtual CPU (vCPU) ===
<blockquote>A Virtual CPU (vCPU) is a CPU which has been virtualized to represent a percentage of the total hardware resource via preemptive scheduling.</blockquote>

=== Virtual GPU (vGPU) ===
<blockquote>A Virtual GPU (vGPU) is a GPU which has been virtualized to represent a percentage of the total hardware resource via preemptive scheduling or thread pinning (to [https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/intel-iris-xe-gpu-architecture.html EUs], or [https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf SMs]) and/or a partition of the device's Video Random Access Memory (VRAM).</blockquote>
=== IVSHMEM ===
<blockquote>According to QEMU's GitHub repo: '''"The Inter-VM shared memory device (ivshmem) is designed to share a memory region between multiple QEMU processes running different guests and the host. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR."''' '''[https://github.com/qemu/qemu/blob/master/docs/specs/ivshmem-spec.txt <nowiki>[3]</nowiki>]'''</blockquote>
[[File:IVSHMEM.png|thumb|A diagram depicting an inter-vm shared memory device being used by two separate virtual machines.]]

=== KVMFR (Looking Glass) ===
<blockquote>looking-glass.io defines Looking Glass as: '''"an open source application that allows the use of a KVM (Kernel-based Virtual Machine) configured for VGA PCI Pass-through without an attached physical monitor, keyboard or mouse."''' '''[https://looking-glass.io/ <nowiki>[4]</nowiki>]'''</blockquote>
[[File:KVMFR Looking Glass.png|thumb|A image posted by Gnif (Creator of the Looking Glass project) to a changelog thread on the Level1techs forum - an active community of VFIO users. ]]

=== NUMA Node ===
<blockquote>Infogalactic defines NUMA as: '''"''Non-uniform memory access (NUMA) is a'' computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors)." [https://infogalactic.com/info/Non-uniform_memory_access <nowiki>[5]</nowiki>]'''</blockquote>
[[File:NUMA Node.png|thumb|A diagram depicting Non-Uniform Memory Access (NUMA) nodes.]]

=== Application Binary Interface (ABI) ===
<blockquote>Infogalactic defines an ABI as '''"the interface between two program modules, one of which is often a library or operating system, at the level of machine code."''' '''[https://infogalactic.com/info/Application_binary_interface <nowiki>[6]</nowiki>]'''</blockquote>
[[File:Application Binary Interface (ABI).png|thumb|A diagram contrasting API and ABI as well as their respective roles in software compatibility.]]

=== SR-IOV ===
<blockquote>docs.microsoft.com defines SR-IOV as '''"The single root I/O virtualization (SR-IOV) interface is an extension to the PCI Express (PCIe) specification. SR-IOV allows a device, such as a network adapter, to separate access to its resources among various PCIe hardware functions." [https://archive.is/FcDoK <nowiki>[7]</nowiki>]'''</blockquote>
[[File:SR-IOV.png|thumb|A diagram depicting host and guest interactions in a Single Root I/O Virtualization environment.]]

=== GVT-g ===
<blockquote>wiki.archlinux.org defines Intel GVT-g as '''"a technology that provides mediated device passthrough for Intel GPUs (Broadwell and newer). It can be used to virtualize the GPU for multiple guest virtual machines, effectively providing near-native graphics performance in the virtual machine and still letting your host use the virtualized GPU normally." [https://wiki.archlinux.org/title/Intel_GVT-g <nowiki>[8]</nowiki>]'''</blockquote>
[[File:SR-IOV-Diagram.png|thumb|A diagram of Intel's GVT-g (Graphics Virtualization Technology g).]]

=== VirGL ===
<blockquote>[[File:Virgl.png|thumb|A diagram depicting GPU instruction redirects to a single host graphics driver.]]lwn.net defines Virgl as: '''"a way for guests running in a virtual machine (VM) to access the host GPU using OpenGL and other APIs... The virgl stack consists of an application running in the guest that sends OpenGL to the Mesa virgl driver, which uses the virtio-gpu driver in the guest kernel to communicate with QEMU on the host."''' '''[https://lwn.net/Articles/767970/ <nowiki>[9]</nowiki>]'''</blockquote>

2023-03-02T00:34:34Z

Arthur:

This page indexes the articles contained within Open-IOV.

If you're new to GPU Virtualization start by reading the '''[[Introduction]]''' article.
=== Start Here ===
[[Introduction]]

===Abstract===
[[Introductory Concepts & Definitions|Glossary]]

[[Virtualization Fundamentals]]

[[Merged Drivers]]

=== Design Documents ===
[[Virtual IO Internals|Virtual I/O Internals]]

[[GPU Driver Internals]]
=== GVM Integration Documents ===
[https://open-iov.org/index.php/OpenRM <nowiki>GVM [Nvidia Open Kernel Modules]</nowiki>] (support documentation up-to-date)

[https://open-iov.org/index.php/AMDGPU <nowiki>GVM [AMDGPU]</nowiki>] (support documentation not up-to-date)

===Projects===
[https://linux-gvm.org/ GPU Virtual Machine (GVM)]

[https://open-iov.org/index.php/LibVF.IO LibVF.IO]

[[Hyperborea]]

[https://open-iov.org/index.php/LIME_Is_Mediated_Emulation LIME Is Mediated Emulation]

[https://open-iov.org/index.php/Looking_Glass_KVMFR Looking Glass]

[https://openxt.atlassian.net/wiki/spaces/OD/pages/10747915/What+is+OpenXT OpenXT]

[https://gitlab.com/vglass OpenXT: vGlass]

[https://github.com/OpenXT/surfman OpenXT: Surfman (legacy DRM)]

[https://www.bromium.com/opensource/ Bromium/uXen]

[https://xenproject.org/help/documentation/ Xen Project]

[https://www.qubes-os.org/doc/ Qubes OS]

[https://projectacrn.github.io/2.1/tutorials/using_celadon_as_uos.html Intel Celadon]

[https://open-iov.org/index.php/VGPU_Unlock vGPU_Unlock]

[[LibRM]]
=== Device Support===
[[GPU Support]]

[[CPU Support]]

[[GPU Firmware]]

=== Software Support ===
[https://open-iov.org/index.php/Hypervisor_Support Hypervisor Support]

[[GPU Software Bill Of Materials (SBOM)]]

=== API Documentation ===

==== Kernel APIs ====
[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api Kernel.org Driver Core Documentation]

[https://docs.microsoft.com/en-us/windows-hardware/drivers/display/iommu-based-gpu-isolation NT Kernel (Windows) IOMMU-based GPU Isolation]

[https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/vfio.rst VFIO] - [https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h vfio.h] - [https://elixir.bootlin.com/linux/latest/source/include/linux/mdev.h mdev.h]

[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 VFIO Mediated Device]
==== Driver APIs ====
[[Intel SR-IOV APIs|i915 SR-IOV API]]

[https://projectacrn.github.io/2.1/api/GVT-g_api.html i915 GVT-g API]

[https://nouveau.freedesktop.org/Development.html Nouveau Tools & API]
==== Sample Code ====
GPLv2 sources mirrored from [https://elixir.bootlin.com/linux/latest/source/samples/vfio-mdev/ elixir.bootlin.com] with [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/Makefile simple makefile changes].

[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mtty.c mtty.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy.c mdpy.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-fb.c mdpy-fb.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-defs.h mdpy-defs.h] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mbochs.c mbochs.c]

==== Virtualization APIs ====
[https://open-iov.org/index.php/Mdev-GPU#Mdev-CLI GVM/Mdev-CLI API]

[https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html QEMU Machine Protocol (QMP) Reference Manual]

[https://projectacrn.github.io/2.1/developer-guides/hld/ivshmem-hld.html Inter-VM Shared Memory (IVSHMEM)]
===User Guides===
[https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/ LibVF.IO Setup Guide]

[https://looking-glass.io/docs/stable/ Looking Glass Quickstart Guide]

[https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide Intel GVT-g Setup Guide]

[https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/tree/master/docs AMD GPU-IOV Module Docs]

[https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF PCI passthrough via OVMF]

[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/index RedHat Virtualization Guide]

=== Developer Guides ===
[https://rayanfam.com/tags/hypervisor/ Hypervisor From Scratch]

[https://lwn.net/Kernel/LDD3/ Linux Device Drivers (3rd Edition)]

[https://dri.freedesktop.org/docs/drm/gpu/ GPU Driver Developer's Guide]

[https://dri.freedesktop.org/docs/drm/PCI/pci.html# How To Write PCI Drivers]

[https://doc.dpdk.org/guides-16.04/prog_guide/ivshmem_lib.html Data Plane Development Kit: IVSHMEM Programming Guide]

=== Specifications ===
[https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs Hyper-V Hypervisor Top Level Functional Specification (TLFS)]

=== Communities & Mailing Lists ===
[https://discord.gg/Rb9K9DYxKK Open-IOV Discord]

[https://lists.freedesktop.org/mailman/listinfo/intel-gfx Intel-gfx Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/nouveau Nouveau Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/amd-gfx AMD-gfx Mailing List]

[https://listman.redhat.com/mailman/listinfo/vfio-users VFIO-users Mailing List]

[https://forum.level1techs.com/c/software/vfio/132 <nowiki>Level1Techs Forum [VFIO Topic]</nowiki>]

[https://old.reddit.com/r/VFIO/ VFIO Subreddit]<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

AMDGPU

2023-03-02T00:32:45Z

Arthur:

The AMDGPU driver is AMD's open source GPU driver which comes included with the Linux kernel.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>This software's source code can be viewed [https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/amd/amdgpu here].

This OpenMdev page should aim to provide similar insights about APIs exposed to Linux developers to those documented on the [https://openmdev.io/index.php/OpenRM OpenRM Driver API] page.

This page may provide some of the required information from which details immediately relevant to graphics mediation can be consolidated:

https://dri.freedesktop.org/docs/drm/gpu/amdgpu.html

Articles

2023-03-02T00:32:18Z

Arthur:

This page indexes the articles contained within Open-IOV.

If you're new to GPU Virtualization start by reading the '''[[Introduction]]''' article.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>

=== Start Here ===
[[Introduction]]

===Abstract===
[[Introductory Concepts & Definitions|Glossary]]

[[Virtualization Fundamentals]]

[[Merged Drivers]]

=== Design Documents ===
[[Virtual IO Internals|Virtual I/O Internals]]

[[GPU Driver Internals]]
=== GVM Integration Documents ===
[https://open-iov.org/index.php/OpenRM <nowiki>GVM [Nvidia Open Kernel Modules]</nowiki>] (support documentation up-to-date)

[https://open-iov.org/index.php/AMDGPU <nowiki>GVM [AMDGPU]</nowiki>] (support documentation not up-to-date)

===Projects===
[https://linux-gvm.org/ GPU Virtual Machine (GVM)]

[https://open-iov.org/index.php/LibVF.IO LibVF.IO]

[[Hyperborea]]

[https://open-iov.org/index.php/LIME_Is_Mediated_Emulation LIME Is Mediated Emulation]

[https://open-iov.org/index.php/Looking_Glass_KVMFR Looking Glass]

[https://openxt.atlassian.net/wiki/spaces/OD/pages/10747915/What+is+OpenXT OpenXT]

[https://gitlab.com/vglass OpenXT: vGlass]

[https://github.com/OpenXT/surfman OpenXT: Surfman (legacy DRM)]

[https://www.bromium.com/opensource/ Bromium/uXen]

[https://xenproject.org/help/documentation/ Xen Project]

[https://www.qubes-os.org/doc/ Qubes OS]

[https://projectacrn.github.io/2.1/tutorials/using_celadon_as_uos.html Intel Celadon]

[https://open-iov.org/index.php/VGPU_Unlock vGPU_Unlock]

[[LibRM]]
=== Device Support===
[[GPU Support]]

[[CPU Support]]

[[GPU Firmware]]

=== Software Support ===
[https://open-iov.org/index.php/Hypervisor_Support Hypervisor Support]

[[GPU Software Bill Of Materials (SBOM)]]

=== API Documentation ===

==== Kernel APIs ====
[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api Kernel.org Driver Core Documentation]

[https://docs.microsoft.com/en-us/windows-hardware/drivers/display/iommu-based-gpu-isolation NT Kernel (Windows) IOMMU-based GPU Isolation]

[https://elixir.bootlin.com/linux/latest/source/Documentation/driver-api/vfio.rst VFIO] - [https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h vfio.h] - [https://elixir.bootlin.com/linux/latest/source/include/linux/mdev.h mdev.h]

[https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/vfio-mediated-device.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294 VFIO Mediated Device]
==== Driver APIs ====
[[Intel SR-IOV APIs|i915 SR-IOV API]]

[https://projectacrn.github.io/2.1/api/GVT-g_api.html i915 GVT-g API]

[https://nouveau.freedesktop.org/Development.html Nouveau Tools & API]
==== Sample Code ====
GPLv2 sources mirrored from [https://elixir.bootlin.com/linux/latest/source/samples/vfio-mdev/ elixir.bootlin.com] with [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/Makefile simple makefile changes].

[https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mtty.c mtty.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy.c mdpy.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-fb.c mdpy-fb.c] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mdpy-defs.h mdpy-defs.h] - [https://github.com/OpenMdev/VFIO-Mdev_Samples/blob/master/mbochs.c mbochs.c]

==== Virtualization APIs ====
[https://open-iov.org/index.php/Mdev-GPU#Mdev-CLI GVM/Mdev-CLI API]

[https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html QEMU Machine Protocol (QMP) Reference Manual]

[https://projectacrn.github.io/2.1/developer-guides/hld/ivshmem-hld.html Inter-VM Shared Memory (IVSHMEM)]
===User Guides===
[https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/ LibVF.IO Setup Guide]

[https://looking-glass.io/docs/stable/ Looking Glass Quickstart Guide]

[https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide Intel GVT-g Setup Guide]

[https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/tree/master/docs AMD GPU-IOV Module Docs]

[https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF PCI passthrough via OVMF]

[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/index RedHat Virtualization Guide]

=== Developer Guides ===
[https://rayanfam.com/tags/hypervisor/ Hypervisor From Scratch]

[https://lwn.net/Kernel/LDD3/ Linux Device Drivers (3rd Edition)]

[https://dri.freedesktop.org/docs/drm/gpu/ GPU Driver Developer's Guide]

[https://dri.freedesktop.org/docs/drm/PCI/pci.html# How To Write PCI Drivers]

[https://doc.dpdk.org/guides-16.04/prog_guide/ivshmem_lib.html Data Plane Development Kit: IVSHMEM Programming Guide]

=== Specifications ===
[https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs Hyper-V Hypervisor Top Level Functional Specification (TLFS)]

=== Communities & Mailing Lists ===
[https://discord.gg/Rb9K9DYxKK Open-IOV Discord]

[https://lists.freedesktop.org/mailman/listinfo/intel-gfx Intel-gfx Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/nouveau Nouveau Mailing List]

[https://lists.freedesktop.org/mailman/listinfo/amd-gfx AMD-gfx Mailing List]

[https://listman.redhat.com/mailman/listinfo/vfio-users VFIO-users Mailing List]

[https://forum.level1techs.com/c/software/vfio/132 <nowiki>Level1Techs Forum [VFIO Topic]</nowiki>]

[https://old.reddit.com/r/VFIO/ VFIO Subreddit]