Optimize ESXi for EMC XtremIO

For a project I’m currently working on, I was asked to document the ESXi hosts recommended / optimal settings required to get the best performance from an EMC XtremIO Storage Array. This was a good opportunity for me to dive a little bit deeper in the configuration, do some performance testing and share the results.

ESXi XtremIO

ESXi XtremIO Host Settings

Set the maximum number of consecutive “sequential” I/Os allowed from one VM before switching to another VM:

Set the maximum I/O request size passed to storage devices. With XtremIO, it is required to change it from 32767 (default setting of 32MB) to 4096 (4MB):

Set the maximum number of active storage commands (I/Os) allowed at any given time at the VMkernel :

Note : The setting for vSphere 5.5 is per volume! So after adding a volume to a ESXi 5.5 host this command needs to be rerun.

Verify which HBA module is currently loaded

For example when using Qlogic :

Set Queue Depth (in this case Qlogic)

Reboot the host and than verify that queue depth adjustment is applied

When using the vSphere Native Multipathing :

Set the vSphere NMP Round Robin path switching frequency to XtremIO volumes from the default value (1000 I/O packets) to 1:

When using the EMC PowerPath software :

Upload the EMC PowerPath software installer to a local datastore and run the following line (be sure to change the path and filename to your own specifications!).

vCenter Settings

The maximum number of concurrent full cloning operations should be adjusted, based on the XtremIO cluster size. The vCenter Server parameter config.vpxd.ResourceManager.maxCostPerHost determines the maximum number of concurrent full clone operations allowed (the default value is 8). Adjusting the parameter should be based on the XtremIO cluster size as follows:

  • 10TB Starter X-Brick (5TB) and a single X-Brick – 8 concurrent full clone operations
  • Two X-Bricks – 16 concurrent full clone operations
  • Four X-Bricks – 32 concurrent full clone operations
  • Six X-Bricks – 48 concurrent full clone operations
     
    XtremIO-vCenter

    VAAI

    Be sure to check if VAAI is enabled :

    Why? Let me show you the differences between the deployment of a Windows 2012 R2 template (15GB) with VAAI enabled and disabled :

    VAAI enabled : 34 seconds
    XtremIO-VAAI-Enabled
    VAAI disabled : 68 seconds
    XtremIO-VAAI-Disabled
    Now imagine the difference when using bigger templates or clones!

    Performance Testing

    Before and after changing the settings I ran some simple IOmeter tests with the following configuration :

    4x Windows 2012 R2 4vCPU’s, 8GB, 40GB vDisk for OS, 40GB vDisk connected to a Paravirtual vSCSI Adapter used for the IOmeter test file.

    XtremIO-IOmeter

    One of the VM’s was used as IOmeter manager / dynamo and the rest of the IOmeter dynamo processes connected to the manager, all configured with 4 workers per dynamo process. The VM’s where on the same ESXi host to be sure that we can compare the results and no other influences could affect the tests.

    Default ESXi Settings

    Test Name IOPS MBps
    Max Throughput-100%Read 67604 2107
    RealLife-60%Rand-65%Read 68803 528
    Max Throughput-50%Read 44780 1392
    Random-8k-70%Read 74179 574

    Optimal ESXi Settings

    Test Name IOPS MBps
    Max Throughput-100%Read 93876 2924
    RealLife-60%Rand-65%Read 108679 841
    Max Throughput-50%Read 39949 1240
    Random-8k-70%Read 100129 773

    Not bad increase of IOPS and throughput I must say! My advice? Apply the recommended settings! 🙂

  • Marco van Baggum

    Marco van Baggum

    Works as a Virtualization Consultant for ITQ. More details can be found on the About page

    3 thoughts on “Optimize ESXi for EMC XtremIO

    1. Hi Marco,

      Just stummbled into this. Why is latency so high in your IOMeter screenshot?
      Also are these results from a single vm or all 4 aggregated?

      1. Hi Guido,

        The results are aggregated from 4 managers with 4 workers each. And your right the “Maximum I/O Response Time” way to high for this setup, it may have something to do with the fact that when I was running the tests the storage / network guys where also reconfiguring the setup. I didn’t run the tests for a baseline but to see if there were any differences on the throughput / IOPS part. So I also didn’t finetune the IOmeter configuration. Maybe when this project is almost finished, I will try to run a baseline test to see what this puppy can do.

    Leave a Reply

    Your email address will not be published. Required fields are marked *