The Art of Quotas in OpenShift/k8s
The use of quotas in OpenShift (Kubernetes underneath) to control resources of your application is a great idea for guardrailing your container’s resources well. However the art or craft or “magic” (PFM) of creating them and setting them for proper use is still a bit of a PITA for my (software engineer) taste. I love the idea of them and how they work. I do not like the developer experience (DX) so far. So here are some notes from my recent workings with OpenShift 3.x (with k8s underneath) on defining quotas, finding the right mix, setting the scope and adding some YAML to your builds to not affect your quota in a negative way.
So what is a Quota and why would I use them?
A quota in OpenShift is a defined range for resources such as CPU, memory, pods, and storage. You can set request minimums, limits, or both across CPU and memory. You can specify maximum (including 0 Mi) on storage space. And you can specify a maximum number of pods to allow in your namespace. Quotas are important as they can set request level minimums of resources such as CPU and memory for your application. This way you are guaranteed to have at least that minimum set of resources for your namespace. It also sets limits on CPU, memory and the number of pods so you do not have any runaway processes or problems with maxing out resources on nodes and “hostile takeovers” or internal “DOS” attacks. Quotas also help create hardware resources that are more dense and do not waste space or computing power in your on-premise or cloud (or hybrid) “data center”.
Kubernetes within OpenShift is charged with watching the quotas as well as figuring out resource loads and usage across all worker notes for deployments. The quotas give guardrails on resources so you do not have 1 pod taking over all of a node’s CPU and memory or taking more storage than they are allowed. “Everything in moderation” as they say!
To do this properly you really should know the total memory and CPU cores of your worker nodes. With the “oc” command line interface you can run oc get nodes
and then oc describe node xxxxxxxxx
with the name of the node to get the CPU and memory utilization on those pods with quotas set (if you have permissions to do so). Or ask your server team, admin, or SRE if they know the number of worker nodes, and the CPU cores and memory allocated to each of them. You can use monitoring software to get this as well for sure on true enterprise implementations. Get those numbers so you know if your resource quotas even make sense. Then go on to what is next.
Realize your requests can only go up to 100% but your limits can go past 100%. The limits allow pods to ramp up to complete a task, and then release resources and go back to the requested so other pods can ramp up and down as well. This allows your hardware to be more densely used across all your servers/VMs/instances.
How do I know what to set them as?
There is the question! That is what I get presented with ALL THE TIME. What do you set them to? As with any hard question in the computer science realm, it will depend on a few things such as usage, code language, routines in the code, etc. It also depends on developers knowing what their code does, what resources it needs, and what parts of the code are the most intensive. If they do not know enough about that, be forewarned: YOU will have to find out and you will get pissed at them for not knowing their own architecture well enough. There are many developers I know that still struggle with the ports, protocols, and services their application requires. Developers in the OpenShift/k8s space need to get smarter on this and we need to help them get there. So suck it up, buttercup! (I have now put down my soapbox and stored it securely.)
I have spent some time testing things and have a good estimation (so far) of where to start on my projects for .NET Core, NodeJS, keycloak gatekeeper, etc. (And by working with smart guys like my co-worker Mike S. and brainstorming.) That does not help anyone else in the world at all unless we share that information. So that is what I am doing here. Sharing what I have to help someone else get to the crest of the learning curve far faster. Below is a sample quota for my simple PeopleAPI microservice.
This is what I have found:
- For Java based applications I have used a request CPU of at least 500m (500 millicores or 1/2 of a CPU core) and depending on the application a limit of 1 CPU or 1500 millicores. For memory of Java applications I have used 500Mi to start and up to 2Gi if an intense application or API. I have actually had to increase to more than that when testing RStudio in a container as it is memory and CPU intensive depending on what you are doing. But this is a good start. (And remember, not everything is made to run in a container just because it can.)
- For NodeJS I have used 250m and 750m for the CPU millicores of request and limit. And Memory I have used 250Mi and 500Mi for APIs in NodeJS. They are not as heavy.
- Something in Golang would require even less for an API or smaller application.
- I did find that a .NET Core 2.1 API I could use the same 250m request and 750m limit on CPU millicores. However the memory I ran out of when running the API so I had to go from 250Mi request to 1Gi limit for the memory. It tricked me! Because it would deploy fine, but certain API calls would make it run out of memory. So I adjusted and tested more to find the right amount. You will have to troubleshoot and narrow down your resources to what is required until you have some baselines to use for your types of projects.
Again these are a starting point to start experimenting with, not a definite range as each application will be unique on requirements. You have to start somewhere and you have to test and retest with quotas in place in a development environment and a test or staging environment (without development and build tools sharing your namespace). As you test your application based on processing or calling routes and watching the operations, keep your eyes on a few things. Watch the latency, not just the first time you run something but subsequent times you are calling your application. Use JMeter or something else if you want to automate this and stress it out. Also check the deployment logs and check for OOM (out of memory) errors which I saw with my .NET Core API early on. And make sure others know you are testing! Or you will have some angry co-workers, QA, and power users testing your application.
Your deployment strategy comes into play here as well. A ‘Recreate’ strategy means destroy the pod and then recreate it. So you will not be adding resources when you do this strategy as the old one is dead and the new one takes over. However, a ‘Rolling’ strategy means whatever you resource usage is for that pod you must DOUBLE for the request and the limit or you will run into a resource quota issue. Your requirements will dictate rolling or recreate strategies. More than likely a development project can allow recreate, however production and staging or testing will probably need a rolling deployment. So you have to work that into your math for the resources required.
Also keep in mind: you can use base-2 type numbers (1024) for binary sizing or you can use decimal (i.e. 1,000) sizing when specifying resources and quotas. I have no idea why this is the case. My guess is that marketecture (marketing architecture) starting using 1000 MB for 1 GB and it somehow got into here unfortunately. So when you specifying things in Mi or Gi, that is the regular (in my mind) 1024 MB = 1 GB versus M or G which is 1000 M = 1 G. Use one type consistently, binary or decimal, and make sure your math adds up correctly! Or it will boggle your mind how you are out of resources when you are testing. Trust me…the face palm emoji would go well here.
How do I apply them?
Once you have a good quota you want to start with, you can implement it by running the oc create -f quota.yaml -n name-space-here
type of syntax with your oc
CLI. That will apply the quota. You can run the oc get quota
to find any current quotas on your projects or to make sure the one you just applied was applied to the correct project without failure. To remove a quota use theoc delete quota name-of-quota
. You may find yourself using these over and over as you tweak your quota to work correctly.
Another catch I found: if you apply a quota with a running deployment that has no requests or limits (whether creating a quota or deleting one) you must redeploy all your pods to have that change take effect. And if your redeployment runs into a quota limit and will not deploy, you may have to “scale to 0” and then redeploy and then scale back up to 1. Not always but sometimes. And yes. What a PITA! However, once you have the quota working correctly it does guardrail your project from running out of control and taking over a whole node or more! So it is worth figuring out. A good thing to note: if your pod was already deployed with set requests and limits then it will appear within the quota when you apply the quota.
Another thing to keep in mind: if your memory or CPU are low then the timeouts on your readiness and liveness probes may be affected. Your pod may come up slower possibly so your liveness and readiness probes may have to be adjusted. Or your resources increased to operate correctly. If you do not know what liveness and readiness probes are do not make me smack you! Read up on them here.
All of this is why I work on quotas in my development namespace, and then remove the “Jenkins” CPU and memory requirements to find what my real project quota actually is. It can be frustrating until you see all the pieces in action. OpenShift (Kubernetes under the hood) is great for this however you need to set it, test it, adjust, test, adjust, and test under stress to see where your development, testing, staging, and production quotas should be.
Why are my builds not running because of Quotas?
The quota I have pictured above has a scope of “Not Terminating”. That means anything that specifies a completion deadline will not count agains the quota, such as builds and pipelines that go for a bit and then die off. This line of “completionDeadlineSeconds” I put into my build configs (thanks to Mike S.) just under the spec: line (indented 2 spaces) so it has the deadline and does not count on quotas.
spec:
completionDeadlineSeconds: 3600
Ok, so now what?
Now, you try it! Use Minishift or Minikube or a k8s implementation in Vagrant or on AWS or somewhere to see how you can setup your quotas. How they work. How they stop you from going haywire on resource consumption. And how you can develop some methods and baselines for using good resources yourself for you and your team. It will help you in the long run in production for sure. It has helped me quite a bit.
A big shout out to Mike S. (he knows who he is) for his knowledge and help in getting me down this road on quotas! Especially on the YAML for not making builds count against your quota. That is a big deal if you are building inside your namespaces for sure.