For a project I’m currently working on, I was asked to document the ESXi hosts recommended / optimal settings required to get the best performance from an EMC XtremIO Storage Array. This was a good opportunity for me to dive a little bit deeper in the configuration, do some performance testing and share the results.
ESXi XtremIO Host Settings
Set the maximum number of consecutive “sequential” I/Os allowed from one VM before switching to another VM:
1 |
~ # esxcfg-advcfg -s 64 /Disk/SchedQuantum |
Set the maximum I/O request size passed to storage devices. With XtremIO, it is required to change it from 32767 (default setting of 32MB) to 4096 (4MB):
1 |
~ # esxcfg-advcfg -s 4096 /Disk/DiskMaxIOSize |
Set the maximum number of active storage commands (I/Os) allowed at any given time at the VMkernel :
1 2 3 4 |
For ESXi 5.0 & 5.1 ~ # esxcfg-advcfg -s 256 /Disk/SchedNumReqOutstanding For ESXi 5.5 ~ # esxcli storage core device list | grep "Display Name: XtremIO" | awk -F'[()]' '{cmd="esxcli storage core device set -d " $2 " -O 256"; system(cmd)}' |
Note : The setting for vSphere 5.5 is per volume! So after adding a volume to a ESXi 5.5 host this command needs to be rerun.
Verify which HBA module is currently loaded
1 2 3 4 5 6 |
Qlogic ~ # esxcli system module list | egrep "ql|Loaded" Emulex ~ # esxcli system module list | egrep "lpfc|Loaded" Cisco UCS FNIC ~ # esxcli system module list | egrep "fnic|Loaded" |
For example when using Qlogic :
1 2 3 |
~ # esxcli system module list | egrep "ql|Loaded" Name Is Loaded Is Enabled qlnativefc true true |
Set Queue Depth (in this case Qlogic)
1 |
~ # esxcli system module parameters set -p ql2xmaxqdepth=256 -m qlnativefc |
Reboot the host and than verify that queue depth adjustment is applied
1 2 |
~ # esxcli system module parameters list -m qlnativefc | grep ql2xmaxqdepth ql2xmaxqdepth int 256 Maximum queue depth to report for target devices. |
When using the vSphere Native Multipathing :
Set the vSphere NMP Round Robin path switching frequency to XtremIO volumes from the default value (1000 I/O packets) to 1:
1 |
~ # esxcli storage nmp satp rule add -c tpgs_off -e "XtremIO Active/Active" -M XtremApp -P VMW_PSP_RR -O iops=1 -s VMW_SATP_DEFAULT_AA -t vendor -V XtremIO |
When using the EMC PowerPath software :
Upload the EMC PowerPath software installer to a local datastore and run the following line (be sure to change the path and filename to your own specifications!).
1 |
~ # esxcli software vib install -d /vmfs/volumes/VMFSVOLUME/EMCPower.VMWARE.5.9.SP1.P02.b054.zip |
vCenter Settings
The maximum number of concurrent full cloning operations should be adjusted, based on the XtremIO cluster size. The vCenter Server parameter config.vpxd.ResourceManager.maxCostPerHost determines the maximum number of concurrent full clone operations allowed (the default value is 8). Adjusting the parameter should be based on the XtremIO cluster size as follows:

VAAI
Be sure to check if VAAI is enabled :
1 2 3 |
~ # esxcfg-advcfg -s 1 /DataMover/HardwareAcceleratedMove ~ # esxcfg-advcfg -s 1 /DataMover/HardwareAcceleratedInit ~ # esxcfg-advcfg -s 1 /VMFS3/HardwareAcceleratedLocking |
Why? Let me show you the differences between the deployment of a Windows 2012 R2 template (15GB) with VAAI enabled and disabled :
VAAI enabled : 34 seconds
VAAI disabled : 68 seconds
Now imagine the difference when using bigger templates or clones!
Performance Testing
Before and after changing the settings I ran some simple IOmeter tests with the following configuration :
4x Windows 2012 R2 4vCPU’s, 8GB, 40GB vDisk for OS, 40GB vDisk connected to a Paravirtual vSCSI Adapter used for the IOmeter test file.
One of the VM’s was used as IOmeter manager / dynamo and the rest of the IOmeter dynamo processes connected to the manager, all configured with 4 workers per dynamo process. The VM’s where on the same ESXi host to be sure that we can compare the results and no other influences could affect the tests.
Default ESXi Settings
Test Name | IOPS | MBps |
Max Throughput-100%Read | 67604 | 2107 |
RealLife-60%Rand-65%Read | 68803 | 528 |
Max Throughput-50%Read | 44780 | 1392 |
Random-8k-70%Read | 74179 | 574 |
Optimal ESXi Settings
Test Name | IOPS | MBps |
Max Throughput-100%Read | 93876 | 2924 |
RealLife-60%Rand-65%Read | 108679 | 841 |
Max Throughput-50%Read | 39949 | 1240 |
Random-8k-70%Read | 100129 | 773 |
Not bad increase of IOPS and throughput I must say! My advice? Apply the recommended settings! 🙂
Hi Marco,
Just stummbled into this. Why is latency so high in your IOMeter screenshot?
Also are these results from a single vm or all 4 aggregated?
Hi Guido,
The results are aggregated from 4 managers with 4 workers each. And your right the “Maximum I/O Response Time” way to high for this setup, it may have something to do with the fact that when I was running the tests the storage / network guys where also reconfiguring the setup. I didn’t run the tests for a baseline but to see if there were any differences on the throughput / IOPS part. So I also didn’t finetune the IOmeter configuration. Maybe when this project is almost finished, I will try to run a baseline test to see what this puppy can do.
Ok, sounds good. Will stay tuned 🙂