Difference between revisions of "Merged Drivers"

From Open-IOV
Jump to navigation Jump to search
(Added link to vGPU Community Wiki.)
 
(43 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following page will provide specifications and details on the current state of host DRM + VFIO-Mdev drivers in support of various vendors.
The following page will provide specifications and details on the current state of host DRM + VFIO-Mdev drivers in support of various vendors.
In the context of this page the term '''"Merged Driver"''' refers to drivers which ''allow simultaneous acceleration of the host using a device's PF (Physical Function) and acceleration of guests using one or more VFs (Virtual Functions) created using the same device''.
Merged functionality is currently supported for drivers which both make use of VFIO-Mdev and SR-IOV functionality depending on the vendor and driver implementation.<blockquote>An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.
This [https://creativecommons.org/licenses/by/4.0/ CC-BY-4.0] licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.
Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.</blockquote>


== Intel i915 ==
== Intel i915 ==
[[File:Intel GVT-g Capabilities.png|thumb|Intel's slides mention the ability to accelerate up to '8 VMs plus DOM0'. Source: https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf]]
[[File:Intel GVT-g Capabilities.png|thumb|Intel's slides mention the ability to accelerate up to '8 VMs plus DOM0'. Source: https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf]]
Intel currently supports host DRM and VFIO-Mdev functionality in it's [https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/i915 i915 driver sources].
Intel currently supports host DRM and VFIO-Mdev/SR-IOV functionality in it's [https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/i915 current i915 driver sources (VFIO-Mdev/GVT-g)] and [https://github.com/intel/linux-intel-lts/commit/41ef979f0894326c202473807a56b599a2f3d04e upstreaming i915 driver sources (VFIO-Mdev/SR-IOV)].
[[File:Intel i915 Host DRM + VFIO-Mdev..png|thumb|A diagram depicting i915's shared host + VFIO-Mdev driver model.]]
 
=== Known Issues ===
 
# SR-IOV functionality is undocumented in the [https://01.org/linuxgraphics/gfx-docs/drm/gpu/i915.html i915 driver API documentation].    <br />  '''''Confirmed affected versions:''' *''[[File:Intel i915 Host DRM + VFIO-Mdev..png|thumb|A diagram depicting i915's shared host + VFIO-Mdev driver model.]]
 
=== Resolved Issues ===


== AMDGPU + GPU-IOV Module Merged ==
# Multiplexing functionality requires use modified KVM and Xen hypervisors ([https://01.org/sites/default/files/documentation/01x08b-kvmgt-a.pdf KVMGT]/[https://wiki.xenproject.org/wiki/XenGT XenGT]).  '''''<br />Confirmed affected versions:'''  [https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/i915 current i915 driver sources (GVT-g)]'''<br />Fixed in:''' [https://github.com/intel/linux-intel-lts/commit/41ef979f0894326c202473807a56b599a2f3d04e upstreaming i915 driver sources (SR-IOV)]''
At this time AMDGPU does not currently support VFIO-Mdev functionality. It may be possible to merge the [https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization GPU-IOV Module (GIM) sources] with the [https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/amd/amdgpu Linux kernel's AMDGPU driver sources] to produce a merged driver for use on AMD GPUs.


== Nvidia ==
== Nvidia ==
For more information on host DRM and VFIO-Mdev support on Nvidia drivers refer to section 6 of the [https://docs.google.com/document/d/1pzrWJ9h-zANCtyqRgS7Vzla0Y8Ea2-5z2HEi4X75d2Q/ vGPU Community Wiki].
=== Known Issues ===
 
# Power management on laptops running mediated graphics functionality may causes graphical errors when not plugged in to AC power.    <br />'''''Confirmed affected versions:''' 460.32.01, 460.73.04''      <br />'''''Possible mitigation:''''' [https://lore.kernel.org/lkml/[email protected]/ lore.kernel.org: ''"vfio/pci: Change the PF power state to D0 before enabling VFs"'']
# VFIO-vmalloc errors may occur as a result of page collisions between host & guest on GPUs with smaller VRAM frame buffer sizes.  <br />'''''Confirmed affected versions:''' 460.32.01, 460.73.04''
#Mdev service daemons may crash or load incorrectly requiring a service restart or reboot during host runtime.<br />'''''Confirmed affected versions:''' 460.32.01, 460.73.04, 510.xx.xx''
#Guest drivers fail to initialize correctly when VFIO-Mdev devices are mixed with some VFIO passthrough'd USB hubs.<br />'''''Confirmed affected versions:''' 460.73.01''
 
=== Resolved Issues ===
 
# Upon executing [https://openmdev.io/index.php/Virtual_IO_Internals#QEMU_adds_VFIO_device_to_IOMMU_container-group QEMU mdev device initialization] for a second Mdev an IOMMU group binding error occurs in QEMU preventing the device from being brought up.<br />'''''Confirmed affected versions:''' 460.32.01, 460.73.04<br />'''Fixed in:''' 510.xx.xx''
 
=== Module Configuration ===
Depending on your use-case the use of some additional parameters when booting the system may be helpful.
 
Here is a list of some parameters which may be used when loading the module via GRUB or Systemd-boot.
{| class="wikitable"
|+Module Parameters
!Parameter
!Description
!Side-Effects
|-
|cudahost=1
|Allows use of CUDA on the host system.
|Windows guests may fail.
|-
|nvidia.vgpukvm=0
|Disables GPU virtualization on the host.
|
|}
 
== AMDGPU ==
At this time AMDGPU does not currently support VFIO-Mdev functionality. It may be possible to incorporate [https://openmdev.io/index.php/Mediated_Device_Internals#Mdev_Mode Mdev Mode mediated device support] similar to those functions in nvidia.ko and i915.ko in the [https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/amd/amdgpu Linux kernel's AMDGPU driver sources] to produce a driver suitable for merged host+guest DRM for use with AMD GPU devices.
 
=== Known Issues ===
 
# Host DRM does not work alongside guest VFIO-Mdev.    <br />'''''Confirmed affected versions:''' *''
#The amdgpu kernel module doesn't contain hooks for guest signalling via [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=721eecbf4fe995ca94a9edec0c9843b1cc0eaaf3 irqfd] & [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d34e6b175e61821026893ec5298cc8e7558df43a ioeventfd] used for VFIO-mdev callbacks. <br />'''''Confirmed affected versions:''' *''  <br />

Latest revision as of 00:22, 2 March 2023

The following page will provide specifications and details on the current state of host DRM + VFIO-Mdev drivers in support of various vendors.

In the context of this page the term "Merged Driver" refers to drivers which allow simultaneous acceleration of the host using a device's PF (Physical Function) and acceleration of guests using one or more VFs (Virtual Functions) created using the same device.

Merged functionality is currently supported for drivers which both make use of VFIO-Mdev and SR-IOV functionality depending on the vendor and driver implementation.

An absence of critical technical documentation has historically slowed growth and adoption of developer ecosystems for GPU virtualization.

This CC-BY-4.0 licensed content can either be used with attribution, or used as inspiration for new documentation, created by GPU vendors for public commercial distribution as developer documentation.

Where possible, this documentation will clearly label dates and versions of observed-but-not-guaranteed behaviour vs. vendor-documented stable interfaces/behaviour with guarantees of forward or backward compatibility.

Intel i915

Intel's slides mention the ability to accelerate up to '8 VMs plus DOM0'. Source: https://01.org/sites/default/files/documentation/an_introduction_to_intel_gvt-g_for_external.pdf

Intel currently supports host DRM and VFIO-Mdev/SR-IOV functionality in it's current i915 driver sources (VFIO-Mdev/GVT-g) and upstreaming i915 driver sources (VFIO-Mdev/SR-IOV).

Known Issues

  1. SR-IOV functionality is undocumented in the i915 driver API documentation.
    Confirmed affected versions: *
    A diagram depicting i915's shared host + VFIO-Mdev driver model.

Resolved Issues

  1. Multiplexing functionality requires use modified KVM and Xen hypervisors (KVMGT/XenGT).
    Confirmed affected versions:
    current i915 driver sources (GVT-g)
    Fixed in:
    upstreaming i915 driver sources (SR-IOV)

Nvidia

Known Issues

  1. Power management on laptops running mediated graphics functionality may causes graphical errors when not plugged in to AC power.
    Confirmed affected versions: 460.32.01, 460.73.04
    Possible mitigation: lore.kernel.org: "vfio/pci: Change the PF power state to D0 before enabling VFs"
  2. VFIO-vmalloc errors may occur as a result of page collisions between host & guest on GPUs with smaller VRAM frame buffer sizes.
    Confirmed affected versions: 460.32.01, 460.73.04
  3. Mdev service daemons may crash or load incorrectly requiring a service restart or reboot during host runtime.
    Confirmed affected versions: 460.32.01, 460.73.04, 510.xx.xx
  4. Guest drivers fail to initialize correctly when VFIO-Mdev devices are mixed with some VFIO passthrough'd USB hubs.
    Confirmed affected versions: 460.73.01

Resolved Issues

  1. Upon executing QEMU mdev device initialization for a second Mdev an IOMMU group binding error occurs in QEMU preventing the device from being brought up.
    Confirmed affected versions: 460.32.01, 460.73.04
    Fixed in: 510.xx.xx

Module Configuration

Depending on your use-case the use of some additional parameters when booting the system may be helpful.

Here is a list of some parameters which may be used when loading the module via GRUB or Systemd-boot.

Module Parameters
Parameter Description Side-Effects
cudahost=1 Allows use of CUDA on the host system. Windows guests may fail.
nvidia.vgpukvm=0 Disables GPU virtualization on the host.

AMDGPU

At this time AMDGPU does not currently support VFIO-Mdev functionality. It may be possible to incorporate Mdev Mode mediated device support similar to those functions in nvidia.ko and i915.ko in the Linux kernel's AMDGPU driver sources to produce a driver suitable for merged host+guest DRM for use with AMD GPU devices.

Known Issues

  1. Host DRM does not work alongside guest VFIO-Mdev.
    Confirmed affected versions: *
  2. The amdgpu kernel module doesn't contain hooks for guest signalling via irqfd & ioeventfd used for VFIO-mdev callbacks.
    Confirmed affected versions: *