Building a Serverless Playground in Kubernetes using Knative & KinD

Introduction
In the vast, ever-expanding cosmos of cloud computing, a new star has risen: Serverless. This revolutionary concept, also known as Function-as-a-Service (FaaS), empowers developers to build and deploy applications without the burden of managing underlying infrastructure. With Serverless, your application automatically scales in response to events, ensuring optimal performance, and you only pay for the compute time you actually consume, leaving behind the inefficiencies of idle servers.
Idle servers refer to servers or computing resources that are not actively being used to process any tasks or requests. In traditional server-based architectures, servers are provisioned to handle peak loads or expected workloads, resulting in periods of time when the servers are idle or underutilized. These idle servers consume power and resources without actively contributing to the processing of tasks, leading to inefficiencies in resource utilization and increased costs.
Most cloud providers these days offer their own serverless solutions like AWS with Lambda, GCP with Cloud Functions, and Azure with Azure Functions. What if you desire complete control over your serverless infrastructure? What if you want to build and shape it according to your specific needs? This is where Kubernetes and Knative come into play.
In this blog, we will delve into the world of serverless architecture, exploring the pivotal roles Kubernetes and Knative play in enabling serverless computing. Moreover, I will guide you through the process of building and deploying your own serverless applications on a local Kubernetes cluster using KinD, granting you full control and flexibility over your serverless playground.
So, fasten your seatbelts and prepare for an exhilarating journey into the realm of serverless computing, where you will gain a deep understanding of the architecture, discover the power of Kubernetes and Knative, and master the art of building and deploying serverless applications.
What is Serverless?

The term ‘Serverless’ is a misnomer and may be misleading as servers are still involved, but the trouble of managing & maintaining the servers is eliminated. With Serverless, developers can concentrate solely on writing code without concerning themselves with infrastructure, deployment, or configuration. Code is uploaded as Functions, which can be triggered as needed, and resources are allocated specifically for the runtime of each function. Once a function finishes executing, the resources are released, allowing for a seamless and hassle-free development experience. The serverless paradigm provides the following benefits if used correctly:
- Scalability
- Cost Efficiency
- Rapid Development
- Improved Resiliency
- Increased Focus on Business Logic
Serverless with Kubernetes?
In the realm of container orchestration, one platform stands tall — Kubernetes. Originally designed by Google, it’s an open-source system that automates the deployment, scaling, and management of containerized applications. Kubernetes organizes containers into ‘pods’ (the smallest deployable unit that can be created and managed), enabling smooth scaling and load balancing which are critical for running applications at scale.
However, while Kubernetes provides a robust and versatile framework for container orchestration, it doesn’t natively support the serverless paradigm. That’s where Knative steps in.
Knative to the rescue!
Knative, an open-source project also started by Google, is designed to extend Kubernetes to provide primitives for serverless platforms. It offers components for deploying, running, and managing serverless applications, thereby turning your Kubernetes cluster into a powerful serverless platform.
In this blog post, we will also use Kubernetes in Docker (KinD) — a tool that enables running Kubernetes clusters using Docker containers as ‘nodes’. KinD is particularly useful for developers, owing to its ease of setup and local deployment.
Introduction to Knative
Knative, often considered as the future of serverless on Kubernetes, was designed to simplify the complexity of deploying and running serverless applications on Kubernetes. It extends Kubernetes to provide a set of middleware components essential for building modern, source-centric, and container-based FaaS applications that can run anywhere.
Knative is composed of two main components: Serving and Eventing. Together, these components make Knative a powerful tool to transform your Kubernetes cluster into a serverless platform.
Serving
Knative Serving powers your applications with some awesome features like automatic scaling, revisions for each code change and the ability to roll back to a previous revision. It provides a worry-free serverless experience with its superior traffic routing and network programming capabilities.

The beauty of Knative Serving lies in its Custom Resource Definitions (CRDs) that help define and control your serverless workload’s behavior. Think of it as the ship’s crew navigating through the open sea.
- Services: Consider services as the “captain” of the ship. This resource manages the entire journey of your workload. It coordinates the creation of other objects, ensuring your app has a route, a configuration, and a new revision each time the service is updated. It can be directed to always route traffic to the latest revision or to a fixed revision, just like a captain choosing to follow the most recent map or sticking to a traditional route.
- Routes: The “helmsmen” of your application, they steer network endpoints to one or more revisions. They manage traffic flow like helmsmen guiding the ship’s course. This includes fractional traffic navigation and named routes.
- Configurations: The “navigational charts” of your application. This resource maintains the desired state for your deployment, providing a clear demarcation between code and configuration. Alterations to the configuration create a new revision, much like a change in course plotted on a navigational chart.
- Revisions: These are the “logbooks” of the ship, the immutable snapshots of the code and configuration for each modification made to the workload. They can be retained as long as needed, with Knative Serving revisions automatically scaled up and down in response to incoming traffic, just like the adjustments made by a ship in response to sea conditions.
Eventing
The other component, Knative Eventing, helps in creating, managing, and delivering events between decoupled applications, making it easier for developers to build and operate event-driven applications.
Knative Eventing is the backbone of an event-driven architecture in Kubernetes. It is centered around producers, events, and consumers (or sinks). In this blog post, we will be covering only the Serving aspect of Knative and not the Eventing feature that it provides.
Now we have understood the core concepts of Knative, let’s jump into Setting up the Kubernetes Environment using KinD.
Setting up Knative on a Kubernetes Cluster Locally
Building an efficient, streamlined serverless playground using Kubernetes and Knative requires a few basic steps. We will be using KinD (Kubernetes in Docker) to create a local Kubernetes cluster and set up Knative on it. Let’s begin by understanding a bit about KinD and how to set it up.
Understanding and Setting up KinD
Kubernetes in Docker, or KinD, for short, is a tool that allows you to run Kubernetes clusters locally using Docker containers as “nodes”. Think of it like building a model train set. Instead of a real train that needs a whole infrastructure to function, you can create and manipulate a model train set within your room. KinD is the “model train set” of Kubernetes. It lets you work on your Kubernetes cluster within the comfort of your local machine.
Setting up KinD is a straightforward task:
- Install Docker: Follow the instructions specific to your operating system to install Docker. You can find the installation guide on Docker’s official website.
- Install KinD: After Docker is up and running, the next step is to install KinD. You can easily do this using package managers like apt for Ubuntu or brew for MacOS. Instructions for installing KinD can be found here.
- Create a Cluster: Once Docker and KinD are ready, you can create your very own Kubernetes cluster on your local machine!
kind create cluster --name kind-cluster
With your local Kubernetes cluster ready via KinD, you now have your own Kubernetes playground. It’s time to bring Knative into the mix and create a truly serverless environment. We will be using their official Quickstart Guide. Since I’m using a MacOS machine, the installation steps provided below will be for that. Worry not, there are steps given in the guide for other types of systems as well and they are very easy to follow along.
Install Knative CLI: for MacOS, this can be done using the following command:
brew install knative/client/kn
Install Knative Quickstart Plugin:
brew install knative-sandbox/kn-plugins/quickstart
Run the Knative Quickstart Plugin:
kn quickstart kind
Keep in mind that we are using KinD in this guide, but you can using Minikube as well. Knative has a quickstart command for that too! Just need to replace kind
with minikube
in the above command.
Verify that the Cluster was created:
kind get clusters
You should get knative
as the output.
Now that you have Knative up and running on your local Kubernetes cluster, it’s time to deploy your first serverless application.
Deploying Your First Serverless Application
With your local Kubernetes cluster powered by Knative, you are ready to deploy your first serverless application. But before we get into that, let’s understand what we are aiming for here. For better understanding, I’ll be explaining the further concepts using a analogy where you are a author going through a journey of publishing your first book.
Understanding the Deployment
In the context of Knative, A ‘Service’ represents your application — akin to the book you are planning to publish.
This ‘book’ (Service) includes your storyline (code), language (runtime), and context (environment variables). As with successive editions of a book, your application may need updates over time. In such a scenario, Knative automatically releases new ‘editions’ (Revisions) for each update and efficiently manages the ‘readership’ (traffic routing) among different editions.
Steps for creating and deploying a sample application
1. Drafting a Service Manifest: Your initial step in this process is to draft a Service manifest. This is like the outline of your book, containing everything about your application such as the location of your Docker image (storyline), environment variables (context), and resource requests. The following is a sample service manifest where we are using a Docker image provided by Knative itself which is a application written in Go that returns Hello World! when we hit the index API of the application:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/window: "6s"
autoscaling.knative.dev/scale-down-delay: "1s"
autoscaling.knative.dev/scale-to-zero-pod-retention-period: "1s"
autoscaling.knative.dev/scale-to-zero-grace-period: "1s"
spec:
containers:
- image: ghcr.io/knative/helloworld-go:latest
ports:
- containerPort: 8080
env:
- name: TARGET
value: "World"
The annotations that I have provided are for making the scale-down feature of Knative more faster, so that we can see the results as soon as possible. Keep in mind that these settings are not ideal for a production environment.
2. Applying the Service Manifest: Once your ‘outline of the book’ (Service manifest) is ready, it needs to be published or ‘applied’. This is accomplished using the kubectl apply
command, which puts your application into the Kubernetes environment - similar to releasing your book into the market.
kubectl apply -f hello-world.yaml
We could’ve used the Knative CLI as well to create the same service, but I think creating our own manifest not only gives us more control over what we want to achieve but also is easier to keep track once we involve GitOps.
3. Reviewing the Service: After applying the manifest, it’s crucial to ensure your service (book) is running as expected. You can check the status of your Service using the following command:
kubectl get ksvc
or we can use the Knative CLI as well:
kn service list
The above commands give us the Service information. We can view the PODs as well by running the following command; akin to reviewing the sales and readership of your book post-launch:
kubectl get po -w
Testing our Knative Service
We can test our service by calling the Function defined by the code, which in our case is the index route of the application we just deployed. In order to do this, first we need to get the URL that we can hit from our local network into the Kubernetes network. We can do that by using the following command:
kn service describe hello -o url
The above command will give a URL like this: http://hello.default.127.0.0.1.sslip.io
.
Now we can use curl
to hit that URL:
curl $(kn service describe hello -o url)
Here’s a video that showcases how the autoscaling in Knative works:
As you can see, initially there are no PODs, but as we make the request, the pods are created and we get the response back. Then after some time, the pods are again scaled down to zero (enabled by default) as there is no traffic.
Congratulations! You’ve successfully ‘published’ (deployed) your first serverless application using Knative on Kubernetes.
In the next section, we’ll delve into how you can make your application ‘scalable’, akin to reaching a broader readership in the world of publishing.
Making the Serverless in Kubernetes Scalable
As an author, the dream isn’t just to publish a book, but to have it read by as many people as possible, scaling its reach to a wide audience. In the world of serverless applications, this dream is analogous to having our application effortlessly handle increasing user demand.
Understanding Scalability in Knative
In Kubernetes, scaling refers to adjusting the number of POD instances to match the incoming load. As traffic increases, more instances are created to handle the requests, and as traffic decreases, instances are deleted to save resources — a process known as ‘autoscaling’.
Imagine having the power to print more copies of your book instantly as demand increases and cease printing when demand subsides. This is the level of responsiveness we’re aiming to achieve with our serverless application using Knative’s autoscaling feature.
By default, there is a autoscaler enabled and as we saw in the video as well, the scale down to zero is also enabled. But, the ideal way is to configure the autoscaler according to your requirements.
Before we get into modifying our service manifest, we need to understand the two types of Autoscalers supported by Knative.
Knative Pod Autoscaler (KPA)
The KPA is an integral part of Knative Serving. It’s automatically turned on when you install Knative Serving. One of its main features is the ability to scale down to zero, which means it can shut down completely when it’s not being used. However, it lacks the capability to adjust based on the amount of CPU or memory being used. Within this type of Autoscaler, there are two types of metrics that we can use.
1. Concurrency: This metric is based on the number of concurrent requests that can be processed by each replica.
2. rps or Requests per second: As the name suggests, this is based the number of requests received by each replica in a second.
Horizontal Pod Autoscaler (HPA)
On the other hand, HPA doesn’t have the ability to scale down to zero. But it does have an advantage over KPA in that it can auto-adjust based on CPU or memory usage.
Now that we understand the two Autoscaler types, let’s modify our service manifest and add some more annotations that specify the configuration to add KPA to our service:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/window: "6s" # stable window
autoscaling.knative.dev/scale-down-delay: "1s"
autoscaling.knative.dev/scale-to-zero-pod-retention-period: "1s"
autoscaling.knative.dev/scale-to-zero-grace-period: "1s"
# New annotations from the previous manifest
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev # Using the KPA class, default
autoscaling.knative.dev/metric: "concurrency" # Using the concurrency mode, default
autoscaling.knative.dev/target: "10" # Soft limit of 10 requests concurrently
autoscaling.knative.dev/target-utilization-percentage: "80" # Scale up if 80% of target is reached
autoscaling.knative.dev/min-scale: "2" # Keep minimum of 2 PODs running to avoid the cold start problem
autoscaling.knative.dev/panic-window-percentage: "20.0" # 20% of the stable window, in this case 2s
autoscaling.knative.dev/panic-threshold-percentage: "150.0" # i.e. 1.5x of what a replica can handle
spec:
containers:
- image: ghcr.io/knative/helloworld-go:latest
ports:
- containerPort: 8080
env:
- name: TARGET
value: "World"
We can apply the new manifest by using the same apply command that we used before:
kubectl apply -f hello-world.yaml
Testing the Autoscaler
In order to test the autoscaling functionality, we can use a CLI tool called hey. It’s a program that can help us send concurrent requests to our service.
hey -z 10s -c 100 \\
$(kn service describe hello -o url) \\
&& kubectl get po
In the above command, I have specified the duration of the load test using -z
flag and the number of requests sent every second using the -c
flag.
Here’s a video demonstrating how the application autoscales like magic!
As you can see, the pods are scaled up based on the configuration we provided when the requests are made, then scaled down as well when there is no traffic.
I hope you also noticed that there are two pods that are always up since we specified the min-scale
in our configuration as 2. This helps us to solve the problem of Cold Start that is usually there while working with the Serverless architecture.
In the next section, we’ll learn how we can keep our serverless infrastructure resilient!
Ensuring that your Knative Infrastructure is Resilient
Just as a well-planned book has back-up plots or twists to keep the reader engaged even if a certain storyline doesn’t land well, a robust serverless application is designed to handle failures gracefully and maintain high availability. This concept is known as resiliency, and it’s a critical aspect of any application deployed in a production environment.
Why is Resiliency Important?
Think of resiliency as a contingency plan in case a particular chapter of your book doesn’t meet the reader’s expectations. In serverless applications, if a component fails, the impact on the overall application should be minimal, and it should recover quickly.
Knative is designed with resiliency in mind. It offers several features that support building resilient applications:

- Autoscaling: As we saw in the demonstration earlier, Autoscaling is a functionality that is provided by Knative out of the box and is enabled by default. Autoscaling in a infrastructure is highly important so that it keeps up with the demand and scales down when traffic is low.
- Revision rollback: If a new version of your application introduces a bug, you can roll back to a previous, stable revision.
- Traffic splitting: You can direct a percentage of your traffic to different revisions. This lets you safely test new versions of your application with a subset of your users, much like an author might share a new chapter with a small test group before full distribution.
- Retry & timeout policies: Knative Eventing allows you to define retry policies and timeout durations for event delivery, ensuring your events don’t get lost in case of temporary issues.
In this blog, I will not be covering how to implement Revision rollback, Traffic splitting, and Retry & timeout policies practically as we did with Autoscaling since that would be a blog in itself. You can learn about how you can implement these features on Knative’s Official Documentation.
Building a resilient serverless application in Knative involves strategically leveraging these features. Just like an author would consider feedback, revisit their drafts, and revise their work, ensuring resiliency in your serverless application is an ongoing process of assessment and improvement.
Conclusion
To sum it up, Knative is an impressive tool that simplifies the process of creating, deploying, and managing serverless workloads within a Kubernetes environment.
In this blog post, we learned about serverless architectures, how Knative enables serverless workloads in Kubernetes, and how to set up a local Kubernetes cluster using KinD. We also delved into Knative’s Serving and Eventing components, the pillars of its functionality, and built our own serverless application.
Whether you’re a startup aiming to optimize resources or a large organization looking to scale your applications, a serverless playground using Knative and Kubernetes offers the efficiency and flexibility your team needs.
Remember, practice and exploration are the keys to mastering any new technology. So go ahead, put your learnings into action, and let Knative unlock a world of serverless possibilities for you.
Author’s Note
In writing this guide, my hope was to present a comprehensive, accessible introduction to Knative and its role in serverless Kubernetes environments. It’s been a journey exploring this powerful platform, one that’s convinced me of its potential in transforming the way we approach application development and deployment.
However, the beauty of technology is in its constant evolution. The landscape of serverless computing is changing rapidly, with new innovations emerging regularly. My invitation to you, the reader, is to join in this journey of learning and discovery. Keep exploring, keep innovating, and keep pushing the boundaries of what’s possible.
I’d love to hear about your experiences with Knative, Kubernetes, and serverless architectures, so feel free to leave comments, share your stories, or ask any questions you might have. Here’s to building a future that’s serverless and full of potential. Happy coding!
Remember to follow me for more content around DevOps, Full Stack, and the tech in general. Check out my website to know more about me or read about my other blog articles: https://karanjagtiani.com!