Sunday, September 27, 2015

ESXi 5.5 Upgrade to 6: Invalid argument when creating vfat file system

Today when we try to upgrade one of our ESXi host we had a strange error.




There was an issue with one of the partitions and somehow the ESXi install was not able to reformat the partition for the upgrade. Maybe because a previous wrong configuration on the partitions.

VMware KB says: "The upgrade attempts to reformat the scratch partition (partition #2) with a VFAT filesystem. This mostly likely fails, as the maximum size allowed for VFAT is 4GB, and most VMFS datastores are larger than 4GB. A 4GB datastore may potentially be erased."

Note: You can check this issue in the VMware KB: KB2015828

So need to investigate which type of partition is this and correct the problem before re-run the upgrade again.

This tasks can be done during the upgrade, with the Alt+F1 we can go to console during the upgrade, or just cancel the upgrade and reboot the server to start normally.

But unfortunately in this case was not possible, the upgrade re-started and ESXi begin loop and never went to normal boot. So go for the first option, correct the problem during the upgrade, or rollback the upgrade and start the ESXi with normal boot.

I decided to cancel and rollback the upgrade. To do this, I need to start the ESXi with the option recovery mode.

After the ESXi started I press SHIFT+R(check image)



After the recovery mode restart, the ESXi will start normally and then we can start the troubleshooting.

Check VMware KB for recovery mode: KB1033604

Connecting to ESXi console(with SSH) started to check the devices/partitions

Since this was an upgrade from ESXi 5.5 to 6.0, to check the partitions I need to use esxcfg-scsidevs command to check the device and partition that was in the error(mpx.vmhba32:C0:T0:L0:2)
# esxcfg-scsidevs -c
Device UID                            Device Type      Console Device                                            Size      Multipath PluginDisplay Name
mpx.vmhba32:C0:T0:L0                  Direct-Access    /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0                  7600MB    NMP     Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)
naa.600508b1001c4118e0e0f8b71eb9b654  Direct-Access    /vmfs/devices/disks/naa.600508b1001c4118e0e0f8b71eb9b654  1144609MB NMP     HP Serial Attached SCSI Disk (naa.600508b1001c4118e0e0f8b71eb9b654)
naa.60a980002d676739503f426f7675504d  Direct-Access    /vmfs/devices/disks/naa.60a980002d676739503f426f7675504d  768062MB  NMP     NETAPP iSCSI Disk (naa.60a980002d676739503f426f7675504d)
As we can see in the image error the device/partition that is preventing the upgrade is the vmhba32:C0:T0:L0. So I need to check the partition #2 in this device.

So we need to use partedUtil to check this.
# partedUtil getptbl /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0
gpt
968 255 63 15564800
1 64 8191 C12A7328F81F11D2BA4B00A0C93EC93B systemPartition 128
5 8224 520191 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
6 520224 1032191 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
7 1032224 1257471 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0
8 1257504 1843199 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
2 15357952 15562751 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0 
The problem is in the #2 partition that is identify as the vmkDiagnostic partition. vmkDiagnostic partition is the coredump partition. So we need to fix the coredump partition.

Just to check the coredump partition run: esxcli system coredump.
# esxcli system coredump partition list

Name                    Path                                        Active  Configured
----------------------  ------------------------------------------  ------  ----------
mpx.vmhba32:C0:T0:L0:2  /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:2    true        true
mpx.vmhba32:C0:T0:L0:7  /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:7   false       false
I get 2 partitions in this ESXi for coredump. One active and other not active.

A previous ESXi installation and configuration was not very well configured.
So nothing to fix here, just delete the partition that is freezing the upgrade and re-run the upgrade.

To delete the partition we need to use the partedUtil again.

# partedUtil delete /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0 2
Error: Read-only file system during write on /dev/disks/mpx.vmhba32:C0:T0:L0
Unable to delete partition 2 from device /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0
So since this a coredump and read-only partition we cannot delete before we disable the coredump.
# esxcli system coredump partition set --enable false
Lets check the coredump partitions again
# esxcli system coredump partition list
Name                    Path                                        Active  Configured
----------------------  ------------------------------------------  ------  ----------
mpx.vmhba32:C0:T0:L0:2  /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:2   false        true
mpx.vmhba32:C0:T0:L0:7  /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:7   false       false
Now coredump partition is disable, then we can delete the partition.
# partedUtil delete /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0 2

Checking partitions again, we can check that #2 was deleted.
# esxcli system coredump partition list

Since partition #7 is also set in the ESXi, I deleted also, then after upgrade we can create a new one, or the upgrade itself will create a new one.

Now we can re-run the upgrade again and will finish without any issues.

Hope this article can you help fixing this issue that you may encounter in your ESXi upgrade.



6 comments:

  1. Thankz i deleted the second partition and that did it , after a reebot the installation continued whitout problem.

    thankz for sharing.

    ReplyDelete
  2. "Unknown" is correct.

    This article: http://en.community.dell.com/techcenter/b/techcenter/archive/2016/02/05/esxi-upgrade-fails-with-an-error-permission-denied was even simpler for my specific situation, and worked flawlessly. There doesnt seem to be a need to go back and recreate the deleted partition 2, or reconfigure the coredump partition.

    the comments in this article mention this: http://www.itxperience.net/en/esxi-upgrade-operation-failed-error-permission-denied/

    The only other thing i would add is my original error was referencing mpx.vmhba32, but when i looked my coredump was actually on mpx.vmhba40 (i had no mpx.vmhba32), so i just worked with mpx.vmhba40, then re-did the upgrade and it worked fine.


    ReplyDelete
    Replies
    1. Hi Hayboy,

      First this blog is already close, my blog now is www.provirtualzone.com.

      First in the case of what I written here, was not possible to access ESXi console, and the ESXi was in a loop because of the baad upgrade, so rollback was the only option to fix the issue first, then upgrade.

      You need always to assign a coredump partition and if you have 2 you should delete one. Or you can disable, but why you should have 2 partitions?

      vmhbaxx number always depends on number of devices that you have in your ESXi host. vmhba32 is not mandatory to all, that was just my example. We need always to following articles but for our own environment. We cannot do a 100% accurate example for all situations and environments.

      Thank You for your comments.

      Luciano Patrao

      Delete
  3. I have exact same problem..We have Dell R630 on ESXi 5.5. I tried upgrade to ESXI 6.0, it failed with that same screen message you had here.
    So, if I delete one of those 2 partitions, will it rebuild itself? I read also, you can disable it too? Thanks advanced!

    ReplyDelete
    Replies
    1. Hi Klnyc,

      Thanks for your comment.

      First this blog is already close, my blog/site now is www.provirtualzone.com.

      But regarding the question, you should delete and have only one partition for coredump.

      But this will not rebuild, but will be enable and use the one that is active. So if there is already a partition you need to active one and disable the other and delete. This is not rebuild, but delete one and active one that already exists.

      If you delete both, then you need to recreate one coredump partition from scratch.

      If you want, you can comment the same article in my new blog/site

      Thank You

      Luciano Patrao

      Delete