Container Resources Specification¶
When a container is active it’s consuming resources in the form of allocated memory by its running processes and CPU-time when the processes are computing and not waiting for something to happen.
A RAIL cluster consist of a set of worker nodes that have a given capacity to run containers. This capacity is expressed in how much memory and how many CPU cores are available for each worker node. When RAIL schedules containers to run together on these worker nodes it might end up in a situation where a processes in the container fail to allocate more memory. This will cause the container to be terminated. Likewise if too many containers compete for the CPU-time available they might run slower than required.
We want to avoid the situation where containers start up, run for a while, and are then abruptly terminated before they complete their job or while serving a client. It’s considered better behavior of the cluster to report that it can’t run the container right now and wait for resources to become available before it starts the container.
For the cluster to be able achieve this behavior the users of RAIL are required to declare how much memory and how many CPUs their containers require to run. RAIL is shared infrastructure and users are also encouraged to set limits on resource usage so that RAIL can ensure that a run-away process are throttled or killed so that it does not impact other innocent users.
Kubernetes also allow admins to enforce quotas on resources consumed by all the pods in a namespace, and RAIL will probably end up using this mechanism as well.
The resource requests and limits are specified in a YAML object that looks like this:
resources:
requests:
cpu: 250m
memory: 10Mi
limits:
cpu: 500m
memory: 20Mi
The customary unit to use for CPU is m for “milli” (aka 1/1000), so 250m means that this container require a quarter of a single CPU to run satisfactory. Another way to say that is that this container will contribute a load of 0.25 on the worker node on average.
The customary unit to use for memory is Mi
which is the regular “megabytes”, while the unit M
is pure metric “mega” (aka 1,000,000 bytes). You can also use Gi
for “gigabytes”.
The requests
specify the minimum amount of CPU and memory that a container
needs to run. RAIL uses these values to determine which nodes can accommodate
the Pod.
The limits
specify the maximum amount of CPU and memory that a container is allowed to use. If a container tries to use more than its limit, the container runtime will throttle the container’s CPU or terminate the container if it exceeds the memory limit.
The key differences are:
Requests guarantee the container will have at least that much CPU and memory available, while limits cap the maximum the container can use.
If a container exceeds its requests, it can use more resources if available on the node. But it cannot exceed its limits.
Pods are scheduled based on their requests, not their limits. Limits are enforced by RAIL to prevent resource hogging.
Setting requests and limits properly is important to ensure Pods are scheduled correctly and to prevent performance issues or terminations due to resource exhaustion.
You can read more about resource limits from the Kubernetes documentation.
How to determine what to request¶
It’s a chore to actually figure what resources your containers require to run.
The lazy user will just put huge values in the resource request. Besides being anti-social this might actually prevent your containers from running at all. If RAIL can’t find a worker node with this much memory or that many CPU cores, then the container will simply not run. RAIL just wait for a sufficient huge worker node to join the cluster. One day it might actually happen — what’s not reasonable today might become so in the future.
The anti-social part is that if this container actually runs then the huge resource request will prevent other containers from running on the node, even if the actual consumption of resources are well below the request. This will cause the worker node to idle when there is actually useful work it could have done. To prevent anti-social behavior we might put strict quotas on namespaces or even make users pay for usage based on the resources requested.
How can you then figure out the “right” request values. One way is to guess values
that make your containers run, and then use the kubectl top pod
command to make
RAIL report on the resources actually consumed. When you find the steady state use those values and then set the limit somewhat bigger.
The metrics reported by kubectl top pod
are sampled every 15 seconds, so the container need to run for some time for any value to show up.