Documentation

Frequently Asked Questions

Kubernetes Conformance and end-to-end testing

Why were so many tests skipped?

When running the e2e plugin on Sonobuoy, you will notice that a large number of tests are skipped by default. The reason for this is that the image used by Sonobuoy to run the Kubernetes conformance tests contains all the end-to-end tests for Kubernetes. However, only a subset of those tests are required to check conformance. For example, the v1.16 Kubernetes test image contains over 4000 tests however only 215 of those are conformance tests.

The default mode for the e2e plugin (non-disruptive-conformance) will run all tests which contain the tag [Conformance] and exclude those that with the [Disruptive] tag. This is to help prevent you from accidentally running tests which may disrupt workloads on your cluster. To run all the conformance tests, use the certified-conformance mode.

Please refer to our documentation for the e2e plugin for more details of the built-in configurations or our blog post on the Kubernetes test suite.

How do I determine why my tests failed?

Before debugging test failures, we recommend isolating any failures to verify that they are genuine and are not spurious or transient. Unfortunately, such failures can be common in complex, distributed systems. To do this, you can make use of the --e2e-focus flag when using the run command. This flag accepts a regex which will be used to find and run only the tests matching that regex. For example, you can provide the name of a test to run only that test:

sonobuoy run --e2e-focus "should update pod when spec was updated and update strategy is RollingUpdate"

If the test continues to fail and it appears to be a genuine failure, the next step would be to read the logs to understand why the test failed. To read the logs for a test failure, you can find the log file within the results tarball from Sonobuoy (plugins/e2e/results/global/e2e.log) or you can use the results command to show details of test failures. For example, the following commands retrieve the results tarball and then use jq to return an object for each test failure with the failure message and the associated stdout.

outfile=$(sonobuoy retrieve) && \
  sonobuoy results --mode detailed --plugin e2e $outfile |  jq '.  | select(.status == "failed") | .details'

Carefully read the test logs to see if anything stands out which could be the cause of the failure. For example: Were there difficulties when contacting a particular service? Are there any commonalities in the failed tests due to a particular feature? Often, the test logs will provide enough detail to allow you to determine why a test failed.

If you need more information, Sonobuoy also queries the cluster upon completion of plugins. The details collected allow you to see the state of the cluster and whether there were any issues. For example: Did any of the nodes have memory pressure? Did the scheduler pod go down?

As a final resort, you can also read the upstream test code to determine what actions were being performed at the point when the test failed. If you decide to take this approach, you must ensure that you are reading the version of the test code that corresponds to your test image. You can verify which version of the test image was used by inspecting the plugin definition which is available in the results tarball in plugins/e2e/definition.json under the key Definition.spec.image. For example, if the test image was k8s.gcr.io/conformance:v1.15.3, you should read the code at the corresponding v1.15.3 tag in GitHub. All the tests can be found within the test/e2e directory in the Kubernetes repository.

How can I run the E2E tests with certain test framework options set? What are the available options?

How you provide options to the E2E test framework and determining which options you can set depends on which version of Kubernetes you are testing.

To view the available options that you can set when running the tests, you can run the test executable for the conformance image you will be using as follows:

KUBE_VERSION=<Kubernetes version you are using>
docker run -it k8s.gcr.io/conformance:$KUBE_VERSION ./e2e.test --help

You can also view the definitions of these test framework flags in the Kubernetes repository.

If you are running Kubernetes v1.16.0 or greater, a new feature was included in this release which makes it easier to specify your own options. This new feature allows arbitrary options to be specified when the tests are invoked. To use this, you must ensure the environment variable E2E_USE_GO_RUNNER=true is set. This is the default behavior from Sonobuoy v0.16.1 in the CLI and only needs to be manually set if working with a Sonobuoy manifest generated by an earlier version. If this is enabled, then you can provide your options with the flag --plugin-env=e2e.E2E_EXTRA_ARGS. For example, the following allows you set provider specific flags for running on GCE:

sonobuoy run --plugin-env=e2e.E2E_USE_GO_RUNNER=true \
  --plugin-env=e2e.E2E_PROVIDER=gce \
  --plugin-env=e2e.E2E_EXTRA_ARGS="--gce-zone=foo --gce-region=bar"

Before this version, it was necessary to build your own custom image which could execute the tests with the desired options.

For details on the two different approaches that you can take, please refer to our blog post which describes in more detail how to use the new v1.16.0 Go test runner and how to build your own custom images.

Some of the registries required for the tests are blocked with my test infrastructure. Can I still run the tests?

Yes! Sonobuoy can be configured to use custom registries so that you can run the tests in airgapped environments.

For more information and details on how to configure your environment, please refer to our documentation for custom registries and air-gapped environments.

We have some nodes with custom taints in our cluster and the tests won’t start. How can I run the tests?

Although Sonobuoy plugins can be adapted to use custom Kubernetes PodSpecs where tolerations for custom taints can be specified, these settings do not apply to workloads started by the Kubernetes end-to-end testing framework as part of running the e2e plugin.

The end-to-end test framework checks the status of the cluster before beginning to run the tests. One of the checks that it runs, is checking that all of the nodes are schedulable and ready to accept workloads. This check deems any nodes with a taint other than the master node taint (node-role.kubernetes.io/master) to be unschedulable. This means that any node with a different taint will not be considered ready for testing and will block the tests from starting.

With the release of Kubernetes v1.17.0, you will be able to provide a list of allowed node taints so that any node with an allowed taint will be deemed schedulable as part of the pre-test checks. This will ensure that these nodes will not block the tests from starting. If you are running Kubernetes v1.17.0 or greater, you will be able to specify the taints to allow using the flag --non-blocking-taints which takes a comma-separated list of taints. To find out how to set this flag via Sonobuoy, please refer to our previous answer on how to set test framework options.

This solution does not enable workloads created by the tests to run on these nodes. This is still an open issue in Kubernetes. The workloads created by the end-to-end tests will continue to run only on untainted nodes.

For all versions of Kubernetes prior to v1.17.0, there are two approaches that you may be able to take to allow the tests to run.

The first is adjusting the number of nodes the test framework allows to be “not-ready”. By default, the test framework will wait for all nodes to be ready. However, if only a subset of your nodes are tainted and the rest are otherwise suitable for accepting test workloads, you could provide the test framework flag --allowed-not-ready-nodes specifying the number of tainted nodes you have. By setting this, the test framework will allow for your tainted nodes to be in a " not-ready" state. This does not guarantee that your tests will start however as a node in your cluster may not be ready for another reason. Also, this approach will only work if there are untainted nodes as some will still need to be available for the tests to run on.

The only other approach is to untaint the nodes for the purposes of testing.

What tests can I run? How can I figure out what tests/tags I can select?

The e2e plugin has a number of preconfigured modes for running tests, with the default mode running all conformance tests which are non-disruptive. It is possible to configure the plugin to provide a specific set of E2E tests to run instead.

Which tests you can run depends on the version of Kubernetes you are testing as the list of tests changes with each release.

A list of the conformance tests is maintained in the Kubernetes repository. Within the GitHub UI, you can change the branch to the tag that matches your Kubernetes version to see all the tests for that version. This list provides each test name as well where you can find the test in the repository. You can include these test names in the E2E_FOCUS or E2E_SKIP environment variables when running the plugin.

Although the default behavior is to run the Conformance tests, you can run any of the other Kubernetes E2E tests with Sonobuoy. These are not required for checking that your cluster is conformant and we only recommend running these if there is specific behavior you wish to check.

There are a large number of E2E tests available (over 4000 as of v1.16.0). Many of these tests have “tags” which show that they belong to a specific group, or have a particular trait. There isn’t a definitive list of these tags, however below are some of the most commonly seen tags:

Conformance
NodeConformance
Slow
Serial
Disruptive
Flaky
LinuxOnly
Feature:* (there are numerous feature tags)

There are also specific tags for tests that belong to a particular Special Interest Group (SIG). The following SIG tags exist within the E2E tests:

[sig-api-machinery]
[sig-apps]
[sig-auth]
[sig-autoscaling]
[sig-cli]
[sig-cloud-provider]
[sig-cloud-provider-gcp]
[sig-cluster-lifecycle]
[sig-instrumentation]
[sig-network]
[sig-node]
[sig-scheduling]
[sig-service-catalog]
[sig-storage]
[sig-ui]
[sig-windows]

The Sonobuoy aggregator wont start on my Windows node. Why not?

When the Sonobuoy aggregator may land on a Windows node, you need to add the --security-context-mode=none flag when invoking Sonobuoy. This is because Windows nodes currently do not support fields such as runAsUser which causes problems for the pod when it starts up. The node tries to start the pod and chown certain files but that process errors out on Windows, causing the pod to be unable to properly start up.

The information gathered on the cluster is useful for me, but do I have to run a plugin to obtain it?

No, you can run the cluster queries via the command sonobuoy query. Read more details about it here.

Report Issues

Documentation

Basics

Plugins

Advanced

Resources