Running E2E Tests On Windows Clusters
Jun 21, 2021
In the previous blog post, we introduced Sonobuoy support for Windows, showed how to set up a Kubernetes cluster which includes Windows nodes, and ran a custom Windows plugin.
In this post we would like to focus on the running the the Kubernetes end-to-end tests against your cluster. Instead of heavily modifying the e2e plugin itself in order to run on Windows, we’ve provided a new plugin (3 variants of a plugin really) which wrap all the necessary configuration together. This means that, ultimately, you just have to run
sonobuoy run -p NewPlugin.yaml
See the TL;DR; at the end of this post for a quick summary of the new plugins and the problems they solve for you.
But if you are curious what these plugins are doing differently and how Windows E2E tests are different upstream, keep reading. By the end of this post you should be able to run the e2e tests against a cluster which includes Windows node and understand how it is different from running them against a Linux-only cluster.
Note: For most of this blog we will discuss the logic of the commands/configuration but you will not need to run commands directly to follow along. This is because, when all is said and done, Sonobuoy has automated all of these steps.
Getting A Windows Cluster
If you would like to run Sonobuoy from a Windows node, you need to set up a Kubernetes cluster with Windows nodes. One of the easiest ways to get started is to use an Azure account and create an AKS cluster. It is easy but does require an account and costs based on resources used. Find more instructions here.
You can also set up a cluster using Vagrant and FriedrichWilken/KubernetesOnWindows. This way, you avoid setting up extra accounts and incurring fees.
For the purposes of this blog post, you can start with a 2 node cluster:
- 1 control plane, Linux node
- 1 Windows node for workloads
We will discuss using a cluster with Linux workloads later in this post.
The First Milestone: Running A Single Test
Before we run, we should walk. We will start simple and run a single test (that targets Windows behavior) on this simplified cluster. Later, we will run more tests and include more complicated cluster configurations.
Looking at the Kubernetes tests, let’s choose a random test that targets Windows and runs quickly, for instance
should not be able to create pods with unknown usernames.
With our test chosen, lets move onto the next hurdle.
Problem One: E2E Image Compatibility
If you try and run the existing e2e plugin on this cluster, the first problem you’ll run into is that the Sonobuoy aggregator will get placed onto the Windows node and run fine, but it will fail to launch the e2e plugin correctly since the only schedulable node is the Windows node (the Linux control plane node is tainted as NoSchedule by default) and the conformance image (which the e2e plugin uses) does not provide a Windows compatible image.
To get around this, we are going to specify that the e2e plugin run on a Linux node. In our simple, 2-node cluster, this means that we need to add a nodeSelector to the podSpec as well as tolerate the control-plane NoSchedule taint (no need to actually run these commands, we are just discussing the logic right now).
Problem 2: Windows Skipped By Default
Now if we were to run our chosen test with the tolerations/nodeSelectors we need, Sonobuoy will get placed on the Windows node and the e2e plugin will run from the Linux control-plane node. However, now you’ll run into another blocker: the tests are explicitly coded to skip any Windows test by default unless a flag is specified. Looking through the code, you can see that we just need to provide the
--node-os-distro=windows flag when invoking the tests.
If you include this value in the plugins environment variable
E2E_EXTRA_ARGS, combined with the other points discussed above, you can successfully run and pass a Windows specific test in your Kubernetes cluster.
But running just one test on a simple test cluster isn’t very useful. Let’s talk about how we can run more tests on more complex clusters.
To run the plugin with the changers we’ve talked about so far, run:
sonobuoy run \ --plugin-env=e2e.E2E_EXTRA_ARGS='--progress-report-url=http://localhost:8099/progress --node-os-distro=windows' \ --e2e-focus="should not be able to create pods with unknown usernames"
Note: The tolerations and nodeSelector discussed have already been included into the default e2e plugin, so we don’t need to manually set them.
Milestone Two: More Tests!
If you try and simply run all of the conformance tests (the default
E2E_FOCUS when running Sonobuoy) you will still hit some tests that don’t work well on Windows either by design or due to the early development stage of the Windows tests. As a goal, let’s copy what the upstream sig-windows tests execute and run the tests matching the focus/skip regular expressions:
Now you’re targeting the correct set of tests, but you’ll eventually get failures such as:
no matching manifest for windows/amd64 in the manifest list entries
This is because although we have a Sonobuoy Windows image and placed the e2e pod on a Linux node, the e2e tests themselves launch other workloads which use various images. The default images that upstream Kubernetes tests use are not all Windows compatible yet.
To override this, we have to tell the tests where to look up test images. This is the same process used to test air-gapped clusters. Typically, users have to download the YAML file which details where the various image repos are then set a command-line flag for Sonobuoy. The downside of this approach is that each time you want to share your plugin with someone you have to give them the plugin, the list of repos, and ensure they provide the right flag.
However, Sonobuoy v0.52.0 adds the ability to provide ConfigMap information into the plugin definition itself. This means the plugin will automatically handle the list with no extra work on your part and it can be transferred along with the plugin YAML. Luckily for us, Windows-testing provides a matrix of Windows/Kubernetes versions and their respective requirements.
Each one of those repo-lists will be incorporated into its own plugin so that you only have to point Sonobuoy to the corresponding plugin and the configuration is already correct.
Milestone Three: Mixed-Node Clusters
Now let’s use a more realistic cluster with both Linux and Windows nodes for workloads.
First, consider running a Windows specific test case: if it lands on the Linux node we will either get a skip or a failure (depending on the test itself), effectively getting false negatives in our signal. We can also get false positives if a test would fail on a Windows node, but passes on a Linux node so we don’t discover the failure.
As a result, per the windows-testing instructions, it is advisable to taint all of the Linux nodes with a NoSchedule taint to avoid test pods getting placed there. After all the tests complete, we’ll want to remove that NoSchedule taint to return our cluster to its original state.
kubectl taint node <node> sonobuoy:NoSchedule
Based on your Kubernetes and Windows versions, run the e2e tests for Windows via:
sonobuoy run --plugin $url
- Choose the right tests to run (and skip)
- Ensure the e2e pod runs on a Linux node (every cluster will have one, even if it is a control-plane node)
- Ensure the e2e pod knows we’re running on Windows nodes (by setting –node-os-distro=windows)
- Provide image registry configuration so we get all the correct Windows images
Note: If you have Linux nodes that process workloads, you will need to taint the nodes prior to running the plugin. Refer to the section above for those instructions.
We wanted to extend a special thank you to numerous people who helped us get to this point. SIG-Windows as a whole works hard on the tests and documentation that we rely heavily on as well as helping fill in gaps in our understanding during development. Thank you to all those involved, especially Jay Vyas, Peri Thompson, James Sturtevant, and Claudiu Belu.
How To Help
One of the easiest ways you can help contribute is to simply tell us what you want, for instance:
- What test frameworks or output formats are common in your Windows workflows?
- What kind of logs or system information do you routinely find yourself needing from the Windows machines?
- What part of Kubernetes do you routinely/specifically want to test? All of Conformance to see where Windows is not compatible? Just Windows features? Some other subset?
If telling us these things is good, helping us actually build or test them is even better. You can create an issue or pull request to sonobuoy or sonobuoy-plugins repos with complete code or even just ideas and we can help build tools to help the community together.
Join the Sonobuoy community:
Sonobuoy can now run on Windows nodes meaning faster, consistent testing on all of your clusters.
Get real-time progress updates from the long-running conformance tests.