Documentation
Basics
Plugins
Advanced
- Detailed result contents
- Configuration Options
- Custom Registries & Airgap Testing
- Using Private Images
- Advanced Customization
Resources
Frequently Asked Questions
Kubernetes Conformance and end-to-end testing
Why were so many tests skipped?
When running the e2e
plugin on Sonobuoy, you will notice that a large number of tests are skipped by default. The
reason for this is that the image used by Sonobuoy to run the Kubernetes conformance tests contains all the end-to-end
tests for Kubernetes. However, only a subset of those tests are required to check conformance. For example, the v1.16
Kubernetes test image contains over 4000 tests however only 215 of those are conformance tests.
The default mode for the e2e plugin (non-disruptive-conformance
) will run all tests which contain the
tag [Conformance]
and exclude those that with the [Disruptive]
tag. This is to help prevent you from accidentally
running tests which may disrupt workloads on your cluster. To run all the conformance tests, use
the certified-conformance
mode.
Please refer to our
documentation for the e2e
plugin for more details of the built-in configurations or
our blog
post on the Kubernetes test suite.
How do I determine why my tests failed?
Before debugging test failures, we recommend isolating any failures to verify that they are genuine and are not spurious
or transient. Unfortunately, such failures can be common in complex, distributed systems. To do this, you can make use
of the --e2e-focus
flag when using the run
command. This flag accepts a regex which will be used to find and run
only the tests matching that regex. For example, you can provide the name of a test to run only that test:
sonobuoy run --e2e-focus "should update pod when spec was updated and update strategy is RollingUpdate"
If the test continues to fail and it appears to be a genuine failure, the next step would be to read the logs to
understand why the test failed. To read the logs for a test failure, you can find the log file within the results
tarball from Sonobuoy (plugins/e2e/results/global/e2e.log
) or you can use the results
command to show details of
test failures. For example, the following commands retrieve the results tarball and then use
jq to return an
object for each test failure with the failure message and the associated stdout.
outfile=$(sonobuoy retrieve) && \
sonobuoy results --mode detailed --plugin e2e $outfile | jq '. | select(.status == "failed") | .details'
Carefully read the test logs to see if anything stands out which could be the cause of the failure. For example: Were there difficulties when contacting a particular service? Are there any commonalities in the failed tests due to a particular feature? Often, the test logs will provide enough detail to allow you to determine why a test failed.
If you need more information, Sonobuoy also queries the cluster upon completion of plugins. The details collected allow you to see the state of the cluster and whether there were any issues. For example: Did any of the nodes have memory pressure? Did the scheduler pod go down?
As a final resort, you can also read the upstream test code to determine what actions were being performed at the point
when the test failed. If you decide to take this approach, you must ensure that you are reading the version of the test
code that corresponds to your test image. You can verify which version of the test image was used by inspecting the
plugin definition which is available in the results tarball in plugins/e2e/definition.json
under the
key Definition.spec.image
. For example, if the test image was k8s.gcr.io/conformance:v1.15.3
, you should read the
code at the corresponding
v1.15.3 tag in GitHub. All the tests can be found within the test/e2e
directory in the Kubernetes repository.
How can I run the E2E tests with certain test framework options set? What are the available options?
How you provide options to the E2E test framework and determining which options you can set depends on which version of Kubernetes you are testing.
To view the available options that you can set when running the tests, you can run the test executable for the conformance image you will be using as follows:
KUBE_VERSION=<Kubernetes version you are using>
docker run -it k8s.gcr.io/conformance:$KUBE_VERSION ./e2e.test --help
You can also view the definitions of these test framework flags in the Kubernetes repository.
If you are running Kubernetes v1.16.0 or greater, a new feature was included in this release which makes it easier to
specify your own options. This new feature allows arbitrary options to be specified when the tests are invoked. To use
this, you must ensure the environment variable E2E_USE_GO_RUNNER=true
is set. This is the default behavior from
Sonobuoy v0.16.1 in the CLI and only needs to be manually set if working with a Sonobuoy manifest generated by an
earlier version. If this is enabled, then you can provide your options with the flag --plugin-env=e2e.E2E_EXTRA_ARGS
.
For example, the following allows you set provider specific flags for running on GCE:
sonobuoy run --plugin-env=e2e.E2E_USE_GO_RUNNER=true \
--plugin-env=e2e.E2E_PROVIDER=gce \
--plugin-env=e2e.E2E_EXTRA_ARGS="--gce-zone=foo --gce-region=bar"
Before this version, it was necessary to build your own custom image which could execute the tests with the desired options.
For details on the two different approaches that you can take, please refer to our blog post which describes in more detail how to use the new v1.16.0 Go test runner and how to build your own custom images.
Some of the registries required for the tests are blocked with my test infrastructure. Can I still run the tests?
Yes! Sonobuoy can be configured to use custom registries so that you can run the tests in airgapped environments.
For more information and details on how to configure your environment, please refer to our documentation for custom registries and air-gapped environments.
We have some nodes with custom taints in our cluster and the tests won’t start. How can I run the tests?
Although Sonobuoy plugins can be adapted to use
custom Kubernetes PodSpecs where tolerations for
custom taints can be specified, these settings do not apply to workloads started by the Kubernetes end-to-end testing
framework as part of running the e2e
plugin.
The end-to-end test framework checks the status of the cluster before beginning to run the tests. One of the checks that
it runs, is checking that all of the nodes are schedulable and ready to accept workloads. This check deems any nodes
with a taint other than the master node taint (node-role.kubernetes.io/master
) to be unschedulable. This means that
any node with a different taint will not be considered ready for testing and will block the tests from starting.
With the release of Kubernetes v1.17.0, you will be able to provide a list of allowed node taints so that any node with
an allowed taint will be deemed schedulable as part of the pre-test checks. This will ensure that these nodes will not
block the tests from starting. If you are running Kubernetes v1.17.0 or greater, you will be able to specify the taints
to allow using the flag --non-blocking-taints
which takes a comma-separated list of taints. To find out how to set
this flag via Sonobuoy, please refer to our previous answer on how to set test framework options.
This solution does not enable workloads created by the tests to run on these nodes. This is still an open issue in Kubernetes. The workloads created by the end-to-end tests will continue to run only on untainted nodes.
For all versions of Kubernetes prior to v1.17.0, there are two approaches that you may be able to take to allow the tests to run.
The first is adjusting the number of nodes the test framework allows to be “not-ready”. By default, the test framework
will wait for all nodes to be ready. However, if only a subset of your nodes are tainted and the rest are otherwise
suitable for accepting test workloads, you could provide the test framework flag --allowed-not-ready-nodes
specifying
the number of tainted nodes you have. By setting this, the test framework will allow for your tainted nodes to be in a "
not-ready" state. This does not guarantee that your tests will start however as a node in your cluster may not be ready
for another reason. Also, this approach will only work if there are untainted nodes as some will still need to be
available for the tests to run on.
The only other approach is to untaint the nodes for the purposes of testing.
What tests can I run? How can I figure out what tests/tags I can select?
The e2e
plugin has a number of preconfigured modes for running tests, with the default mode running all conformance
tests which are non-disruptive. It is possible to
configure the plugin to provide a specific set of E2E
tests to run instead.
Which tests you can run depends on the version of Kubernetes you are testing as the list of tests changes with each release.
A list of the conformance tests is maintained in the
Kubernetes repository. Within the GitHub
UI, you can change the branch to the tag that matches your Kubernetes version to see all the tests for that version.
This list provides each test name as well where you can find the test in the repository. You can include these test
names in the E2E_FOCUS
or E2E_SKIP
environment variables when
running the plugin.
Although the default behavior is to run the Conformance tests, you can run any of the other Kubernetes E2E tests with Sonobuoy. These are not required for checking that your cluster is conformant and we only recommend running these if there is specific behavior you wish to check.
There are a large number of E2E tests available (over 4000 as of v1.16.0). Many of these tests have “tags” which show that they belong to a specific group, or have a particular trait. There isn’t a definitive list of these tags, however below are some of the most commonly seen tags:
- Conformance
- NodeConformance
- Slow
- Serial
- Disruptive
- Flaky
- LinuxOnly
- Feature:* (there are numerous feature tags)
There are also specific tags for tests that belong to a particular Special Interest Group (SIG). The following SIG tags exist within the E2E tests:
- [sig-api-machinery]
- [sig-apps]
- [sig-auth]
- [sig-autoscaling]
- [sig-cli]
- [sig-cloud-provider]
- [sig-cloud-provider-gcp]
- [sig-cluster-lifecycle]
- [sig-instrumentation]
- [sig-network]
- [sig-node]
- [sig-scheduling]
- [sig-service-catalog]
- [sig-storage]
- [sig-ui]
- [sig-windows]
The Sonobuoy aggregator wont start on my Windows node. Why not?
When the Sonobuoy aggregator may land on a Windows node, you need to add the --security-context-mode=none
flag when
invoking Sonobuoy. This is because Windows nodes currently do not support fields such as runAsUser
which causes
problems for the pod when it starts up. The node tries to start the pod and chown
certain files but that process
errors out on Windows, causing the pod to be unable to properly start up.
The information gathered on the cluster is useful for me, but do I have to run a plugin to obtain it?
No, you can run the cluster queries via the command sonobuoy query
. Read more details about it
here.