Daisy-chaining networks over VMware Transit Connect with HCX
By Troy Lindsay
- 11 minutes read - 2279 wordsOne of my favorite partners asked me last month: is it possible to extend a network to two separate VMware Cloud on AWS destination software‑defined data centers (SDDCs) with HCX (eg: A to B to C), and can this be built over Transit Connect (aka: vTGW) so that the network segment is stretched over the AWS backbone?
If you’re not familiar with the concept of network extension, this technology allows customers to bridge two separate layer 2 (L2) networks. This allows workloads to operate in the same manner from a networking perspective from a different physical location, and is typically used for facilitating virtualized workload migration since the virtual machines (VMs) don’t require IP address changes (aka: re‑IPing) to resume operation at the new site. This reduces complexity, downtime, & risk, which increases migration velocity. If you’ve planned and/or participated in large‑scale VM migrations, you probably know how painful re‑IPing can be, and the value that this technology provides.
Anyway, my partner noticed the following unsupported HCX source config entry in the documentation (Excerpt 1) stating that daisy‑chaining a network to three different destination sites isn’t supported, and was curious if this meant that two destination sites is supported. Furthermore, he wanted to know if this would work in VMware Cloud on AWS for SDDCs deployed within the same region (intra‑region) as well as in different regions (inter‑region).
Unsupported Source Configurations
HCX Network Extension does not support the following source configurations:
- Daisy-chaining a single network to three separate destination sites is not supported. For example, A to B to C to D is not supported.
Excerpt 1: VMware HCX v4.3 limitations for network extension
My peers and I hadn’t come across this before and I couldn’t find a definitive answer in the documentation or otherwise, which made me skeptical, but I was thoroughly intrigued. I decided to build it, and it turns out that I was able to get it working with an interesting functional limitation. I won’t go so far as to say that it works™, because I didn’t do extensive testing and I can’t think of a use case where I’d recommend this architecture for customers due to the risks that the complexity and dependencies create (which is why I’m posting this here instead of the AWS Blog), but it’s pretty interesting from a technology perspective.
In this post, I’ll walk through how I built & conducted the experiment.

Build
Step 1: Deploy SDDCs
First, as shown in Figure 1, I deployed three 1‑node SDDCs across 2 AWS Regions in a VMware Cloud Organization (ie: account) for testing both intra‑region & inter‑region, and configured each with a different management subnet, which is a requirement for peering. It took about ~2 minutes to configure each SDDC deployment, and 90‑120 minutes to deploy each SDDC in parallel.
SDDC configs
- Name:
use1a
- Region:
us-east-1
(North Virginia) - Management subnet:
10.5.0.0/16
- Region:
- Name:
use1b
- Region:
us-east-1
(North Virginia) - Management subnet:
10.6.0.0/16
- Region:
- Name:
usw2a
- Region:
us-west-2
(Oregon) - Management subnet:
10.7.0.0/16
- Region:
Step 2: SDDC Group & peerings
Once all three SDDCs were online & available, I created a SDDC Group, added the SDDCs, and ticked the box to enable transitive routing between the 3 SDDCs via Transit Connect. This triggered the automatic deployment of two VMware‑managed AWS Transit Gateway instances (one per AWS Region), gateway attachments to the respective local SDDC(s), and the inter‑region peering between the two vTGWs. After ~15 minutes, traffic could be routed privately across the AWS backbone between the three SDDCs in two Regions with only ~10 clicks to configure this part.

I’ve been working with this technology for a few years, but I still find it amazing that it only took a couple of hours to build a multi‑region, fully-peered base architecture by myself, and that this could be easily scaled to production with only a few minor changes. Imagine what it would take to build this experiment on‑premises.
Anyway, next I created an external peering for my bastion host for managing the environment once I switched to private management name resolution. I may dive into this in another post, but I’m not going to cover this here. Send me a message on LinkedIn or Twitter if you’re particularly interested.
Step 3: Deploy HCX
Afterward, I deployed HCX for each SDDC, which is a single‑click, fully‑automated infrastructure deployment in VMware Cloud on AWS.
Step 4: Add MGW firewall rules
Next, I configured Management Gateway (MGW) firewall rules permitting private connectivity between:
- vCenter instances
- HCX instances
- My bastion host and each vCenter + HCX appliance
Step 5: Enable private name resolution
Then I configured the 3 vCenter & 3 HCX instances for private name resolution to force intercommunication to route over Transit Connect.

Step 6: Configure HCX site pairings
Using the new private name resolution, I connected to each HCX Management Console from my bastion host and configured bi‑directional site pairings between each.
Step 7: Add HCX network profile IP pools
Next, I added unique IP address pools with gateways to each SDDC’s built-in HCX directConnectNetwork1
Network Profile to prep for building the HCX service mesh interconnects.
Wait what?! Why was I messing with the HCX network profile labeled for AWS Direct Connect (DX) traffic when I’m trying to extend my network segment over Transit Connect?
Well, when designing Transit Connect for VMware Cloud on AWS, VMware chose to reuse the NSX Tier 0 router’s virtual network (vNIC) interface that was originally allocated & labeled for traffic that would traverse a DX link. Customers that use Transit Connect today tend to connect their DX links to their vTGW through a Direct Connect Gateway (DXGW), so it kinda makes sense, but updating the associated labels would make this more intuitive for customers.
Custom directConnectNetwork1
IP pool configs
use1a
- Subnet:
192.168.5.0/24
- Gateway:
192.168.5.1
- Subnet:
use1b
- Subnet:
192.168.6.0/24
- Gateway:
192.168.6.1
- Subnet:
usw2a
- Subnet:
192.168.7.0/24
- Gateway:
192.168.7.1
- Subnet:
Note: If you're interested in configuring network extension for high availability (HA), here's the documentation.
Step 8: Build HCX service mesh interconnects

HCX requires a minimum of one IP per service mesh interconnect endpoint per site for non‑redundant network extensions, which was fine for this experiment.
The “middle” site in the daisy‑chain (use1b
) required 2 service mesh interconnect connection endpoints (1/ use1a
↔use1b
and 2/ use1b
↔usw2a
), and we need a gateway per IP pool so that we can route between sites over Transit Connect, so allocating a /30
per site should be enough if you decide to build this yourself.
I allocated a /24
per site for this disposable experiment, but this would’ve been excessive if I were building this for a production use case.
Once added, routes appeared for each in each vTGW’s route tables.
Step 9: Create the network segment

Next, I created the routed network segment in the use1a
SDDC that I stretched first to use1b
, and then across the country to usw2a
.
Network segment config
- Name:
use1a
- Gateway:
192.168.5.1/24
- DHCP:
true
- DHCP DNS:
8.8.8.8, 8.8.4.4
Step 10: Add CGW firewall rules
After that, I created firewall rules Compute Gateway (CGW) permitting internet connectivity and then built a quick VM template with my Ubuntu Server HashiCorp Packer template, deployed 3 VMs from it: vm1
, 2
& 3
, and then set vm1
to ping vm2
continuously, vm2
→vm3
, & vm3
→vm1
.
Step 11: Extend the network segment (1/2)

Next, I extended the network segment from use1a
→use1b
with the Mobility Optimized Networking (MON) feature enabled and an overlapping gateway IP.
If you're not familiar with MON (formerly: proximity routing), this feature allows customers to configure which gateway to route through. It's typically used for minimizing network latency for extended network segments and can be configured at both the segment and vNIC levels. For example, without MON, VMs in use1b that are connected to the extended network need to route traffic through the gateway in use1a- even to connect to other network segments in use1b, which is suboptimal and potentially expensive. With MON, each VM's vNIC can be configured to route through the local gateway for optimal cost and latency.
Step 12: Migrate VMs (1/2)

Afterward, I migrated vm2
& vm3
to use1b
via vMotion migrations with no downtime or packets lost.
Nothing unusual here- just a typical HCX VM migration, but this still amazes me too as its so powerful for customers that want to rapidly migrate to the cloud with little to no downtime.
Step 13: Extend the network segment (2/2)

And now we’ve finally reached the crux of my experiment- could I extend my network segment a second time?! I was thrilled to discover that the extended network was deemed eligible to extend over the second service mesh interconnect, so I configured it with MON enabled and the same overlapping gateway IP.


Step 14: Migrate VMs (2/2)

Afterward, I migrated vm3
from use1b
to usw2a
via another vMotion migration with no downtime and only 1 packet lost.
Test cases
Test 1: Connectivity
Looking good so far- all 3 VMs were still successfully pinging each other across the extended network with one 1 packet lost among the 3 VMs.



Test 2: MON vNIC-level gateway updates
Next, I updated the MON target router location for vm2
’s vNIC in use1b
and vm3
vNIC in usw2a
from the respective source locations.
Both jobs reported as successful completions; however, the job for vm2
in use1b
(“middle”) silently failed.
I tried this a bunch of times, and every time, the job would silently fail and leave the target router set to the gateway in use1a
.
Out of curiosity, I tried moving the target router location for vm3
’s vNIC in usw2a
back to use1b
, and that worked fine.
Then I tried updating the target router location again for the vm2
’s vNIC from the destination, use1b
, and same deal- attempts to update the target router location silently failed.
I also tried archiving the migrations, but moving the target router location afterward still silently failed.
After that, I cloned vm2
to vm4
so that it was native to the use1b
SDDC, and one interesting thing is that the new VM was able to communicate with other VMs on the same network as well as with the internet, but vm4
’s vNIC status hung in a Waiting for IP
in the HCX Management Console for 10 or so minutes until I got tired of waiting and tried updating the target router location to use1b
.
This silently failed again, but the status cleared afterward.
I cloned another VM from vm2
in use1b
and same behavior for vm5
.
Last, I migrated vm3
back from usw2a
to use1b
, and the gateway was forced back to use1a
.
Next, I unextended the network from use1b
to usw2a
and HCX automatically updated the vNICs of vm2
, 4
, & 5
to use the MON use1b
gateway.
Then I extended the network again, and the vNICs in use1b
showed up in both network extensions in use1b
, which I didn’t recall seeing before.
I migrated vm3
back to usw2a
, and its vNIC’s target router was automatically forced to use usw2a
and listed as ineligible for MON.
I did a few more similar tests, and same deal.
Conclusion

In conclusion, I was able to build a daisy‑chained extended network across 3 VMware Cloud on AWS SDDCs deployed in 2 AWS Regions over the AWS backbone via Transit Connect. No differences were observed in how the intra‑region & inter‑region service mesh interconnects operated. The only observed functional limitation of this architecture in the limited testing performed, is that MON will only work for 2 of 3 sites (or 1 of 2 service mesh interconnects) at a time.
Again, I don’t recommend this architecture for customer use cases due to the risk that the complexity & dependencies create.
For example, per the data flow in Figure 15, communicating from vm1
in use1a
to vm3
in usw2a
requires transiting through use1b
, so its inefficient and most administrators wouldn’t expect the atypical potential points of failure between L2‑adjacent workloads.
Unnecessary complexity like this tends to lead to extended outages when things fail.
I found this architecture & experiment interesting from a technology perspective though, and I hope that you did as well.
Next steps
Create an AWS account and then a VMware Cloud account with AWS billing, and try it yourself or build your own experiment.
Warning: It's possible to create a VMware Cloud account with VMware billing and a credit card payment method, but VMware requires a $2,000 USD deposit (reference).
- It should cost somewhere between $60‑100 USD to duplicate this experiment based on current pricing (subject to change)- primarily due to the on‑demand hourly cost of the bare‑metal i3.metal servers.
- If you do, check out two of my open source automation projects that can help you accelerate deployment of portions of the infrastructure:
- Please also reach out via LinkedIn or Twitter afterward and let me know how it went.
Download the architecture diagram and play around with it in diagrams.net (formerly: draw.io) .