it’s moving

 

The most important thing in a data center move, whether it is from OnPrem to/from the Public Cloud and/or from OnPrem to OnPrem, is the data, the applications and their accessibility by internal and external users. One of the interests of a data center move is to take advantage of it to make it more modern, more agile, faster in order to host modern container-based applications next to existing ones.

In this article, I will not cover applications running directly on physical servers, this is a scope I do not usually cover for my customers, and it represents only a small percentage of applications, around less than 10%. On the other hand, I will deal with those running on virtualized environments, which represents about 90% of the applications. The majority of customers have vSphere hypervisors, some may still have some Hyper-V or KVM but it is really very rare.

One of the advantages of server virtualization is the ability to represent a physical machine and everything it contains (operating system, applications, configurations, data, …) in virtual machines (VMs) that are stored in files. Moving these VMs consists in copying these files to the new destination. To access the copied VMs, they need to be connected to a network if possible without having to modify the initial IP addressing plan because it is a tedious and risky work, indeed some applications were not designed to take into account this type of change. To simplify this, it is necessary to virtualize the network and thus be able to reproduce at the destination what there was on the initial site allowing to avoid the change of IP addressing of all the servers and the applications. Note that it is not necessary to virtualize the network at the source only at the destination. Another method would have been to physically replicate the network at the destination, even if this is possible in a move from OnPrem to OnPrem, it is impossible to achieve in a move to the Public Cloud. As you can see, applying virtualization to the network in the same way as it was done for the physical servers allows you to be completely decoupled from the physical infrastructure and to be free to choose your hardware providers and/or hyperscaler.

 

The first step of the move is to have a virtualized destination environment, at least the hypervisor and the network to receive the VMs, then you have to transfer them, we saw that the VMs were files and that you just had to copy them to the destination. It’s as simple as that, however to have a coherent state of the data, the copy will have to be done cold, i.e. VMs switched off, this will cause an unavailability of access to the applications during the whole copy. Once the copy is complete and the VMs are associated with the right networks, they must be started at the destination. The copies will be long and therefore the unavailability will also be long because there will be a large volume of data. Another method is to use replication tools that will replicate the VMs while they are running, and only once the replication is complete will you stop the VMs at the source, finish replicating the residues and then start the VMs at the destination. Both of these manual methods may be fine for small environments but not if there are hundreds or thousands of VMs, it quickly becomes tedious. To further simplify this process, it is possible to orchestrate everything using a tool such as VMware HCX which I will describe and which helps answer some important questions, how do I do a move if my applications don’t allow me downtime? How do I do it if I want a sequential move? Or how do I prepare everything and do it at a specific time?

VMware HCX offers different types of migration:

  • Bulk Migration: For massive VMs migrations, they will be done in parallel by batches.
  • Mobility Groups: to migrate VMs grouped by theme.
  • vMotion Migration: to move VMs in a unitary manner.
  • OS Assisted Migration: to migrate non-vSphere VMs, i.e. KVM or Hyper-v

In addition to the types of migration, there are interesting network features in the context of migration:

  • Interconnect: Deploys a secure virtual network interconnect between HCX appliances on both sides for added security
  • Wan Optimization: Reduce the amount of data that travels between sites through deduplication and compression mechanisms.
  • Network Extension: Extend the Layer 2 network between sites and keep the same addressing scheme on both sides.
  • Application Path Resiliency: Allows you to multiply the communication paths between sites and no longer use those that have failed.
  • Mobility Optimized Networking (MON): Allows the use of the nearest gateway (i.e. on the same site) for applications that wish to reach a different network.

Other use cases are offered by VMware HCX :

 

  • Upgrade vSphere versions: migrate VMs hosted on one version of vSphere to a different version.
  • Workload Rebalancing: Move workloads between different Clouds based on resource availability and/or cost seasonality.
  • Business continuity and protection: to protect workloads on another site and benefit from network advantages. It can also be coupled with SRM (Site Recovery Manager).

 

Let’s go back to the initial topic of moving with other important points of current interest, which are the increasing demands to move to the Public Cloud, these demands lead to these questions, how do I move to a Public Cloud without having to transform my VMs? How do I do it if tomorrow I decide to change to a Public Cloud and avoid being locked in?

To be as independent as possible, you need to have a multi-cloud VM format, i.e. a VM format that is available in OnPrem data centers but also in almost all hyperscalers such as Alibaba Cloud, Amazon Web Services, Google Cloud Platform, Microsoft Azure, Oracle Cloud Infrastructure, OVH, …, and this with a local and/or global geographical presence. This is the case of VMs used by VMware, indeed hyperscalers have decided to offer their customers to use their hypervisor and / or the hypervisor developed by VMware that customers are used to use. This decision is not anodyne because it allows to drastically simplify migrations by eliminating the need for transformation and thus consume their resources more quickly. The largest of them even offer the full VMware SDDC by adding network and storage virtualization coupled with VMware HCX to further accelerate the migration. This gives customers more freedom to choose their cloud provider whether it is OnPrem or Public.

 

In summary, the most important thing in a move is to have a destination that is the most virtualized as possible (servers, network (including security) and storage). Have an orchestrator that will simplify the migration and choose a VM format that is multi-cloud to avoid being locked in with a hyperscaler.

Optimizing the cost of using GPUs

GPUs (Graphical Processing Unit) were created in the 70s to speed up the creation and manipulation of images. They were quickly adopted by game console manufacturers to improve the fluidity of graphics. It was in the 2000s that the use of GPUs for computing outside of graphics began. Today the popular use of GPUs concerns the AI (artificial intelligence) and ML (Machine Learning). More history on the Wikipedia site: Graphics processing unit – Wikipedia. This is why these GPUs are found in datacenter servers because IA and ML type applications require many calculations that are executed in parallel, which a conventional processor (CPU) would have difficulty doing this function. Indeed a CPU is the central element of a server which is there to execute lots of small sequential tasks, very fast and at low latency, for that it can count on about sixty cores per CPU (at the date of writing this article). A GPU on the other hand, is made to perform thousands of tasks in parallel thanks to its thousands of cores which compose it. The downside to a GPU is its cost, it can run into tens of thousands of dollars, so you have to make sure it is being used properly all the time. The ideal is to be able to share it so that it can be used simultaneously by several applications in order to be close to consuming all the resources. This is the credo of Bitfusion solution acquired by VMware in 2019 and which is now available as an add-on to vSphere. The GPUs are installed on the hypervisors and form a pool of GPUs which will be accessible by the applications directly hosted on these hypervisors or via the IP network if the applications are hosted on other hypervisors, the applications can even be hosted on physical servers or be based on Kubernetes container. The use is reserved for Artificial Intelligence or Machine Learning type applications using CUDA routines. CUDA was developed by NVDIA to allow direct access to GPUs for non-graphics applications. Thank to Bitfusion, Applications can consume a GPU, several GPUs or just a portion of a GPU (in reality it is the memory of the GPU that is shared). Once consumption is complete, the applications release the allocated GPU resources which then return to the pool for future requests.

From a technical point of view, Bitfusion requires the installation of components on both sides. On the hypervisors side that have GPUs, a virtual appliance must be deployed on each one that can be downloaded from the VMware site. For clients (VM, Bare Metal or Kubernetes containers / PODs) who will consume GPUs resources, a Bitfusion client must be installed, which will allow the interception of CUDA calls made by the application to transfer them to the Bitfusion appliances via the IP network. It is transparent to the application.

Since the exchanges between clients and Bitfusion appliances go through the IP network, it is preferable to have at least a 10 Gb/s network, you generally need 1 10 Gb/s network for 4 GPUs.