Tag vsphere

it’s moving


The most important thing in a data center move, whether it is from OnPrem to/from the Public Cloud and/or from OnPrem to OnPrem, is the data, the applications and their accessibility by internal and external users. One of the interests of a data center move is to take advantage of it to make it more modern, more agile, faster in order to host modern container-based applications next to existing ones.

In this article, I will not cover applications running directly on physical servers, this is a scope I do not usually cover for my customers, and it represents only a small percentage of applications, around less than 10%. On the other hand, I will deal with those running on virtualized environments, which represents about 90% of the applications. The majority of customers have vSphere hypervisors, some may still have some Hyper-V or KVM but it is really very rare.

One of the advantages of server virtualization is the ability to represent a physical machine and everything it contains (operating system, applications, configurations, data, …) in virtual machines (VMs) that are stored in files. Moving these VMs consists in copying these files to the new destination. To access the copied VMs, they need to be connected to a network if possible without having to modify the initial IP addressing plan because it is a tedious and risky work, indeed some applications were not designed to take into account this type of change. To simplify this, it is necessary to virtualize the network and thus be able to reproduce at the destination what there was on the initial site allowing to avoid the change of IP addressing of all the servers and the applications. Note that it is not necessary to virtualize the network at the source only at the destination. Another method would have been to physically replicate the network at the destination, even if this is possible in a move from OnPrem to OnPrem, it is impossible to achieve in a move to the Public Cloud. As you can see, applying virtualization to the network in the same way as it was done for the physical servers allows you to be completely decoupled from the physical infrastructure and to be free to choose your hardware providers and/or hyperscaler.


The first step of the move is to have a virtualized destination environment, at least the hypervisor and the network to receive the VMs, then you have to transfer them, we saw that the VMs were files and that you just had to copy them to the destination. It’s as simple as that, however to have a coherent state of the data, the copy will have to be done cold, i.e. VMs switched off, this will cause an unavailability of access to the applications during the whole copy. Once the copy is complete and the VMs are associated with the right networks, they must be started at the destination. The copies will be long and therefore the unavailability will also be long because there will be a large volume of data. Another method is to use replication tools that will replicate the VMs while they are running, and only once the replication is complete will you stop the VMs at the source, finish replicating the residues and then start the VMs at the destination. Both of these manual methods may be fine for small environments but not if there are hundreds or thousands of VMs, it quickly becomes tedious. To further simplify this process, it is possible to orchestrate everything using a tool such as VMware HCX which I will describe and which helps answer some important questions, how do I do a move if my applications don’t allow me downtime? How do I do it if I want a sequential move? Or how do I prepare everything and do it at a specific time?

VMware HCX offers different types of migration:

  • Bulk Migration: For massive VMs migrations, they will be done in parallel by batches.
  • Mobility Groups: to migrate VMs grouped by theme.
  • vMotion Migration: to move VMs in a unitary manner.
  • OS Assisted Migration: to migrate non-vSphere VMs, i.e. KVM or Hyper-v

In addition to the types of migration, there are interesting network features in the context of migration:

  • Interconnect: Deploys a secure virtual network interconnect between HCX appliances on both sides for added security
  • Wan Optimization: Reduce the amount of data that travels between sites through deduplication and compression mechanisms.
  • Network Extension: Extend the Layer 2 network between sites and keep the same addressing scheme on both sides.
  • Application Path Resiliency: Allows you to multiply the communication paths between sites and no longer use those that have failed.
  • Mobility Optimized Networking (MON): Allows the use of the nearest gateway (i.e. on the same site) for applications that wish to reach a different network.

Other use cases are offered by VMware HCX :


  • Upgrade vSphere versions: migrate VMs hosted on one version of vSphere to a different version.
  • Workload Rebalancing: Move workloads between different Clouds based on resource availability and/or cost seasonality.
  • Business continuity and protection: to protect workloads on another site and benefit from network advantages. It can also be coupled with SRM (Site Recovery Manager).


Let’s go back to the initial topic of moving with other important points of current interest, which are the increasing demands to move to the Public Cloud, these demands lead to these questions, how do I move to a Public Cloud without having to transform my VMs? How do I do it if tomorrow I decide to change to a Public Cloud and avoid being locked in?

To be as independent as possible, you need to have a multi-cloud VM format, i.e. a VM format that is available in OnPrem data centers but also in almost all hyperscalers such as Alibaba Cloud, Amazon Web Services, Google Cloud Platform, Microsoft Azure, Oracle Cloud Infrastructure, OVH, …, and this with a local and/or global geographical presence. This is the case of VMs used by VMware, indeed hyperscalers have decided to offer their customers to use their hypervisor and / or the hypervisor developed by VMware that customers are used to use. This decision is not anodyne because it allows to drastically simplify migrations by eliminating the need for transformation and thus consume their resources more quickly. The largest of them even offer the full VMware SDDC by adding network and storage virtualization coupled with VMware HCX to further accelerate the migration. This gives customers more freedom to choose their cloud provider whether it is OnPrem or Public.


In summary, the most important thing in a move is to have a destination that is the most virtualized as possible (servers, network (including security) and storage). Have an orchestrator that will simplify the migration and choose a VM format that is multi-cloud to avoid being locked in with a hyperscaler.

Optimizing the cost of using GPUs

GPUs (Graphical Processing Unit) were created in the 70s to speed up the creation and manipulation of images. They were quickly adopted by game console manufacturers to improve the fluidity of graphics. It was in the 2000s that the use of GPUs for computing outside of graphics began. Today the popular use of GPUs concerns the AI (artificial intelligence) and ML (Machine Learning). More history on the Wikipedia site: Graphics processing unit – Wikipedia. This is why these GPUs are found in datacenter servers because IA and ML type applications require many calculations that are executed in parallel, which a conventional processor (CPU) would have difficulty doing this function. Indeed a CPU is the central element of a server which is there to execute lots of small sequential tasks, very fast and at low latency, for that it can count on about sixty cores per CPU (at the date of writing this article). A GPU on the other hand, is made to perform thousands of tasks in parallel thanks to its thousands of cores which compose it. The downside to a GPU is its cost, it can run into tens of thousands of dollars, so you have to make sure it is being used properly all the time. The ideal is to be able to share it so that it can be used simultaneously by several applications in order to be close to consuming all the resources. This is the credo of Bitfusion solution acquired by VMware in 2019 and which is now available as an add-on to vSphere. The GPUs are installed on the hypervisors and form a pool of GPUs which will be accessible by the applications directly hosted on these hypervisors or via the IP network if the applications are hosted on other hypervisors, the applications can even be hosted on physical servers or be based on Kubernetes container. The use is reserved for Artificial Intelligence or Machine Learning type applications using CUDA routines. CUDA was developed by NVDIA to allow direct access to GPUs for non-graphics applications. Thank to Bitfusion, Applications can consume a GPU, several GPUs or just a portion of a GPU (in reality it is the memory of the GPU that is shared). Once consumption is complete, the applications release the allocated GPU resources which then return to the pool for future requests.

From a technical point of view, Bitfusion requires the installation of components on both sides. On the hypervisors side that have GPUs, a virtual appliance must be deployed on each one that can be downloaded from the VMware site. For clients (VM, Bare Metal or Kubernetes containers / PODs) who will consume GPUs resources, a Bitfusion client must be installed, which will allow the interception of CUDA calls made by the application to transfer them to the Bitfusion appliances via the IP network. It is transparent to the application.

Since the exchanges between clients and Bitfusion appliances go through the IP network, it is preferable to have at least a 10 Gb/s network, you generally need 1 10 Gb/s network for 4 GPUs.

Customize an Ubuntu VM with Cloud-Init on vSphere

Cloud-Init seems to be the favorite customization tool for major OSs aiming to be installed on different cloud environments (AWS, Azure, GCP, vSphere, …), it is very powerful, heterogeneous but at the beginning, it is difficult to understand how it works.

Personalization information are stored in a file called user-data. This file is transmitted to Cloud-Init using a mechanism specific to each cloud. In our case, it is the VMtools that will transmit the user-data file, once received, Cloud-Init will execute it.

I wasted a tremendous amount of time finding the minimum steps to customize Ubuntu OS with Cloud-Init in a vSphere environment. I was looking for the possibility of customizing the OS when cloning from an OVF template.

Below is the procedure I use for an Ubuntu 20.04.2 LTS, freshly installed and after its first reboot. I kept the default values ​​except for the French keyboard and the installation of the OpenServer SSH option.

Cloud-Init must be told to retrieve the user-data customization file via the OVF parameters of the VM, there is a step to be done on the VM side and an OS side.

OS side:

  • Delete the file that sets the Datasource to none instead of OVF:
    • sudo rm /etc/cloud/cloud.cfg.d/99-installer.cfg
  • If you want the network configuration to be done, delete the file that disables it:
    • sudo rm /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg
  • If you are in DHCP, the VM will always retrieve the same IP because it will keep the same machine ID. To avoid this, you must reset the identity of the OS:
    • sudo truncate -s 0 /etc/machine-id

VM side:

  • Get the contents of your cloud-init user-data file and encoded it with base64:
    • Base64 user-data.yaml
  • From the vSphere client, when the VM is stopped, activate the properties of the OVF :
    • Select the VM => Configure => vApp Options => vApp Options are disabled … Edit => Click on “Enable vApp options”
    • In the OVF details tab => => check the VMware Tools case
  • From the vSphere client, when the VM is stopped, add the user-data field in the properties of the OVF:
    • => Select the VM => Configure => vApp Options => Properties => ADD enter the Label and the Key id with the user-data value.
  • From the vSphere client, when the VM is stopped, add the value to the user-data field in the properties of the OVF:

    • Select the VM => Configure => vApp Options => Properties => SET VALUE, a popup appear and set it with the base64 key comming from the user-data file retrieved in the step early

Now from this VM, directly make a clone to make another VM or to make a template. If you want to change the user-data file, when deploying the VM, change only the base64 key to the new key

Which platform for « Share Nothing Architecture » applications?

Organizations are seeking more agility to accelerate their business growth. Developing applications for internal or external usage can directly or indirectly impact that growth. It is important to provide agility to developers for them to write these applications. That’s why public cloud services are attractive. Developers can consume services right away by deploy a data service (e.g database) and connect it to their applications. They don’t have to worry about the infrastructure but instead focus only on developing the application. Bringing that flexibility into the datacenter will allow organizations to provide agility while maintaining security.

VMware Cloud Foundation with Tanzu (previously vSphere with Kubernetes or Projet Pacific) is a platform capable of hosting applications running in virtual machines and applications running in Kubernetes Pods (containers). It also provides networking services, storage,  registry, backup and restore services for those applications. Now, it also incorporates data services.

At the time of writing, two solutions were added: Minio and Cloudian. They are two object storage solutions compatible with S3 API. Two other are currently being integrated: Dell EMC ObjectScale, a object storage compatible with S3 and Datastax, a NoSQL database based on Cassandra. There are more integrations to come.


How is it revolutionary?

Unlike the majority of traditional/classic/monolith applications, modern applications also called Cloud Native or Scalable apps do not rely on the infrastructure to optimize their performance and to provide resiliency. They use their own mechanisms for availability, performance and no matter what infrastructure they’re running on. Of course, the infrastructure is essential but only to consume resources like processors, memory or I/O. These applications are often SNA (Shared Nothing Architecture). Each instance of an application uses its own resources on a distinct server and the application distributes the data between these servers. Reading and writing data is distributed for better performances and resilience while taking in consideration a potential loss of a server or a site.

On a physical infrastructure (without virtualization), it’s easy, each instance has its own server and its own resources. However, it creates a financial issue as the servers are dedicated to that usage. It’s not optimal unless all the resources always being consumed. It’s rarely the case.

On a virtual infrastructure, the resources are shared hence not used resources can be use by other applications. It also allows eliminate hardware compatibility issues and to take advantage of other benefits brought by virtualization. Nevertheless, there’s a constraint for SNA applications as the instances are virtualized. We need to ensure these instances and the generated data are distributed on different virtualised servers in case of of server failure.

VMware Cloud Foundation with Tanzu coupled with vSAN Data Persistence platform module (vDPp) is the answer to this problem. Partner editors are able to take advantage of the platform to provide “as a Service” solutions. They can do so by developing an operator to automate the installation, the configuration and simplify keeping it operational.


The service is up and running in one click


vDPp is aware of the infrastructure, the application knows how to get the best performances and availability. The operator thereby distributes the required number of instances on different virtualized servers.

This vSAN storage policy ensures data protection and keeps the application instance and its data on the same  virtualization host


During maintenance operations, the application is informed about the decommission of a virtualization server. vDPp also proactively communicates with the application if the disks start showing signs of failure.

Developers consume these services via APIs and stick to only developing their application. They can use an resilient and performant on-demand data service.


In Conclusion,

VMware Cloud Foundation with Tanzu platform coupled with vSAN Data persistence provide great agility to keep the data services operational. Thanks to that, developers can focus solely on application development while keeping on using their traditional tools. They have a cloud platform as it exists on public cloud.

VMware Cloud Foundation with Tanzu should be seen as a complete platform designed for the development and hosting of traditional and modern applications with integrated on-demand services.