Practical strategies for defending a Kubernetes cluster with Divyanshu Shukla

Host: Welcome to the final chapter of Kubernetes Learning series. In part one we learned about Kubernetes and workloads in general along with some open-source tools.

In part two we learned about approaches for attacking Kubernetes clusters from a red teaming perspective and also understanding attackers mindset while exploiting clusters.

Today we will dive deeper into practical strategies of defending a Kubernetes cluster so that we get an idea of how to defend a cluster and also some of the best practices to follow while defending clusters from attackers.

Divyanshu. Welcome back to the show. Looking forward to the final part of the series.

Divyanshu: we have already seen about the RBAC.

We have performed the scan so we are not going to look into all the security basically protection strategies. We'll talk about some of them. So we'll start with the network policy and network policy is how the traffic is communicating within the pod or how flow of data is happening from one cluster to another cluster as it is required.

Then we'll talk about RBAC security which we have seen again and again. So I won't be talking about RBAC then the secret management we will look into the secrets, how the secrets are mounted and how we can use the sealed secret to store the secret directly in the YAML. And then we look into the Kyverno because Kyverno is extremely simple to set up and it is easy to perform the validations and enforcements on the top of that. And last would be selium.

So selium is the network fabric which we have been using in our lab. Like we created setup on the top of Seleum itself. So we'll see how the Celium would work and we'll talk about the mesh and app armor, the security context but other than that I don't think we can talk about because they are extremely exhaustive.

Like if we'll talk about mesh then we have to show the mesh like how these things work. What do we mean by mesh? This is pretty much advanced but for now we can look into the network policy and then the secret management and possibly the kyverno and the helium. These four things we'll look let's get started.

Host: These four in itself feels like a lot of lot so let's go through one at a time.

Divyanshu: Yeah so the easiest one is the network security policy. It is called as NSP. So based on the pod selector or the label and then how the traffic is flowing, whether it's an ingress or the egress traffic, we can have a network security policy based on the source, IP destination, IP port number and protocol.

So we can allow and deny the rules but it is not as exhaustive as Clium because here if I want to suppose block an endpoint, right? I want to block a specific API to be accessed even within the cluster itself. So in those cases I can't use network security policy or if I want to block a specific layer or something very specific, right? Maybe an header. In those cases I can't use a network security policy. So for that this selium would come into the picture.

But for basics we can always go with the network security policy. So Celium would give basic network security policies plus all the advanced protection strategies which we are looking for. It will also help us to monitor the cluster traffic as well.

Host: Okay, so for a beginner maybe network security policies makes more sense. But as you get into advanced use cases look at how psyllium works.

Divyanshu: so it is basically having a Hubble UI which helps us to monitor or like it helps us to understand how a flow is happening. When a policy is getting created, like from where the traffic would flow, it won't exactly show you the real time data, rather it is more towards showing you how if I'll create a policy maybe from Pod One to Pod Two, I want to block the traffic. So I just want to see if I'll create XYZ thing whether it will work or not.

So for that we have hubble. So we can just write that YAML, paste it and it will show whether that policy is exactly working in the way or not.

Host: Okay, so in a way simulate it and show you whether the policy is valid or not. Okay, correct. That's interesting. Hubble UI.

Divyanshu: Let's get started with the network security policy.

So we'll be blocking the ingress traffic based on the source pod level. I'll explain the objective. There are two namespaces namespace A, namespace B. Each namespace houses an NGINX pod named App A and App B.

Okay, we have labeled App A namespace A and App B namespace B and both are running the NGINX pod and we'll see that initial connectivity is there between the App A and App B. And then once we implement the network security policy in the namespace B it would block the ingress traffic from any pod with the label of App A. So if my pod would have a label of App A then I won't be able to communicate to any pod in the namespace B.

Then we'll retry to access the application or like the pod app B from App A and we'll see whether we are able to access it or not.

Host: Okay, so in a way we are trying to block connectivity between two pods. Correct. Okay,

Divyanshu: So this is the YAML which I wanted to show let me just go into the seven one.

So this is the demo policy. If you'll see okay, this is for the explanation that there is a network security policy of kind. This then deny NS two is the name then in which namespace it will operate and what is the policy type ingress and from where basically the match label should be. So you'll say namespace selector match label is NS2 and here the NS1.

So anything which is coming from NS Two to NS One. And the type is basically the ingress. Right? So I've mentioned the same.

I'll just read it out loud that I'll basically read the ingress part from and namespace selector with match label name NS2 means align incoming connection from any pod in namespace that have the label NS2. However, as there are no specific ports defined, this effectively denies all incoming traffic from namespace NS Two to NS One.

Host: Okay, so it's like wildcard. Deny all traffic. You are not denying a particular port, you are just denying it entirely. You were saying something? Yeah, I was just saying it's not wildcard. Sorry.

It is wildcard. You are denying everything, not just a particular port or something. Correct. Okay, correct.

Divyanshu: So I'll just create two namespaces very quickly.

And if I need to, let's say, block a particular port, how would I do that? Do I need to add port information there? Yes. Okay.

We need those informations. If we don't have, then we have to find where the communication would happen. And then based on that, we have to add those things. Okay, so I'll just run the command and I'll show you. If you'll see I'm trying to access, I'm doing a curl basically, okay? And from the namespace is namespace A and app B. So basically I'm trying to access app B in the namespace b via this service name. And the namespace is namespace A.

So it is just doing a curl. So you'll see it is showing me the NGINX is basically welcome to NGINX, right? Which is the default.

So it shows that I'm able to access my application. Yes. Pod b. Basically I'm able to access then I'm implementing the network policy. So you'll see this network policy, it is the exactly same thing it is telling me in the namespace b specifications, app B from App A.

So if anything is coming from app A to App B and the type is ingress, then block it. So I'll just use this cat UF to create and apply the policy directly.

And now when I'll try to access, let's see whether I'm able to do it or not.

So you'll see, this time I'm not getting output. It says download time out. Correct. So this shows extremely simple network security policy.

And I'll delete this so that we can run our next lab. But before that, if any question you have or anything.

Host: Yeah. So one thing is,

What are some of the use cases you have seen where these types of network restrictions are applied?

If you want to exclude a particular app, let's say you want to exclude five apps, but you want to allow a particular app, can you use network policies for that as well?

Divyanshu: We can use because apps would have labels and no IPS or they have similar structure, so we can do that. But usually I've seen people using Calico or Celium for most of the times because companies from the initial stage itself, they want to have a cluster with advanced network fabric so that later or even if they want to have a network security policy I haven't seen folks using NSP, at least in my experience. I haven't seen if someone has implemented it, then they can definitely have discussion. They can tell the blog if they have written because most of the times, even when I started into the Kubernetes security at that point of also the company implemented like there was no security policy and the first thing they came up with was celium because of this EVP technology, which I'll explain. And in Calico also, if you'll see they have anomaly detection and all that, which is pretty much advanced, like if we not talk about Celium and we'll talk about Calico, in those cases they have detection mechanism in their enterprise and that is also very much advanced, right? It will give you monitoring.

And if I'm talking about anomaly detection, obviously it is monitoring all the traffic.

Correct? Those things are given by these network fabrics which are advanced. So maybe people are using for their basic setup, or if they have something very limited, or if they are doing it for the first time.

But whoever wants to implement a robust security network security, they always use some kind of advanced fabric.

Host: Okay, that makes sense.

Divyanshu: So, next is the secrets. So I'll quickly explain the secret. So, secret is anything which we want to keep secret.

And in Kubernetes it is stored in the base 64 because it's base 64. So it is very easy to get it and decode it. It is equivalent to plain text. So if I'm storing some secret in the Kubernetes and then uploading the same into the GitHub,

so it is very much easy for an attacker to search that GitHub or if that secret is leaked, in that case, it would be easy for anyone to decode that base 64 value and find out what is the secret. So I'll show you how basic secrets work and how it is mounted into the container. And we'll talk about some secret management tools like vault and all that.

But we'll primarily look into the sealed secret, which is a binary provided by Bitnami and it is basically used to encrypt the secret.

Because the secret is encrypted, it is much more easy to upload that secret into your GitHub. Right?

So if encrypted is leaked in that case, anyone won't be able to decrypt until obviously they don't have access to your cluster because your Bitnami pod would be running in that cluster. And I'll explain how basically that tool works.

Host: Yeah, I have that question that if it is encrypted, then how will my pod get the decrypted value? But yeah, I'm sure that you'll show that.

Divyanshu: So let me go into the secret folder, I'll just close these tabs. Sure, yeah, this becomes like my browser where I have thousands of tabs open.

Host: Yeah, everybody, correct. Goes through the same pain, correct. Everyone, whoever is working in it, they have this thing, they have thousands of tabs on Chrome and no firefox open.

Divyanshu: So I'll just create a directory this time because here we don't need anything else.

Like whatever we need, we will create it. So to keep things clean, I'm just creating a directory and inside that I'm just creating two files, username TXT and password TXT and echo the admin and password value. That's it.

Host: So this sort of shows, let's say, if you have a key and secret or something like that, those are stored in those files, correct?

Divyanshu: So if you'll see password TXT, I have password and then the username txtLAB admin, right? So I'll create the secret. The type is generic. So the way we create everything, right, I'm creating a secret and I'm just giving the generic type. There can be different types, but for now we'll focus only on the generic because then we will move in a different direction.

So I just want to keep this sorted. So I'll just quickly create the type generic, like the secret type generic. And this is the name of my secret KS user pass.And from I'm passing the file, which file I want to use, you will say it has created the secret like service, like no pod, you can have the similarly, you will have secret and name of the secret. Okay. And then I'll do a cubectl, get secrets and it will list me what kind of secret it is and you will see the data. Also it is eight bytes and five bytes.

Now, if I'll upload this, so basically I'll explain one more thing. So if you'll see the type is also the opaque, right? This is another thing. So if I want to get this secret so I'll again do the same thing. I'll just do the kubectl, get secret and name of the secret. Here I just did get secrets. So it listed me, which are all the secrets present. But here I'm getting the specific secret and saving it in the YAML.

So when I'm saving it in the YAML, you will see there is password TXT and username TXT in the base 64, right?

Host: So you can easily decode and see the value.

Divyanshu: So if I'll upload this YAML anywhere, it would be like stored, right? And anyone can decode it even if it is not uploaded. And even if this is present in my cluster as an attacker, I can dump it. Correct. So I'll just use the command to decode these values and I'll just use this echo command to keep things clean a little bit and it will show that I have just decoded.

So I got the value of secret and then from there I've grabbed the password TXT output and here I am just using the Hyphen d and passing that username value and the password value. So I'm just using first, output is the variable name of username and password.

And second, I'm using that values and passing it to the base 64 hyphen d command to decode this, which is saying the username is admin and the password is password.

Host: Imagine this being AWS secrets. So now you have access to someone's AWS account in a way.

Divyanshu: Right Because we are seeing the fake credentials. So that might be easy. But in real world, if they lack keys or something which is required and like the infrared team or they don't know exactly whether how easy it is to get this secret, they might be creating something like this.

So now we'll see if we'll create a DB secret YAML and if we'll use it in real world, what we can do.

So I'll just create this secret.

So earlier I just created it via file. And I created if you'll see here, I passed it from the command. This time I'm creating it via YAML. That is the only difference directly manifest. Yes, via manifest. So I'll just kubectl get secret.

Now you'll have another secret with the name of DB secret. DB secret. Yeah. So this time I'll create an application which will use the secret values value, which we have created. Okay? So I'll quickly create this web app .YAML and where I am passing this value DB underscore password..

So let me just show you and I'll do a kubectl apply hyphen f command to apply this YAML and sleep just so that my command is complete and we get enough time that our pod is running successfully.

Now I'll do a print env again to show what are the environment variables. So you'll see, there is this environment variable with DB underscore password and the name is my key value, is my secret password, right.

Host: Which we had created earlier. Correct. Using the manifest. So in a way, you are showing that

how to create a secret.

How can an application like application as in a pod get access to that secret and it gets into the environment.

So if it is not secure, then anybody can print the environment and see all the secrets. Okay, that is helpful

Divyanshu: We'll go back to our lab and this is pretty much about how the secret works, how we mount it and we'll delete and we'll perform a cleanup. And then we'll look into that sealed secret binary, which I was talking about, what we can do here in this case, because it was in the base 64 format. So I would have dumped it or if I would have stored it somewhere, it was easy for me to get that back. But if I'll use this sealed secret, we'll see why it is difficult for an attacker to get the data. So, you'll see, I'm downloading this cube seal binary.

I'll do a wget. It would download that binary. Then I'll install this binary, okay. Into the user local

Host: And this will be on my machine. Let's say as a developer, if I need to add a secret, then Kube seal has to be on my machine. Okay

Divyanshu: So basically the thing is here, if you will see I am doing a Wget and then I am running a controller YAML, right? So this controller is kind of an API server for this sealed secret tool.

Let me just show you. I think it is getting downloaded.

Okay. What I'll do. I'll just do a wget here itself. I think I've already done that. You have already done cat? Yeah.

What I'll do, I'll I'll simply show it from here itself.. 7.3 secrets. This should be much better. Let me zoom in. So you'll see, this will create a service. This will create a role.

This will create cluster role. And it is creating the resources sealed secret. What are the verbs, what are the API group? So here it is creating everything. Kind of an API server.

So this would be the brain of your sealed secret.

Host:Okay. You see, these are custom extension to Kubernetes so that they can support

Divyanshu: here it is mentioned, right? Custom resource definition.

So I am just adding an extension. It is very much similar to a Firefox Chrome extension here. Just I'm adding another extension and whatever I want, I'll mention everything here. This is a huge file, okay. So I'm not going through it. And then I'll just apply so here I think I've already applied the cluster. No, let me reset.

Yes, I have already applied the controller .YAML into the cluster. Now if I'll create another secret namespace and I'll try to do the same thing

Host: okay, so you don't have to define a different type of resource or anything like that.

Divyanshu: No, I'll show you. No, we have to do so we have to do right now I'm just creating a namespace. Then I'll again create the secret YAML. Okay? And I am passing the same value test in both the cases, username and password this time. But here you will see a command.

So this command basically I'm using. Yes. So I'm doing a cat secret, then using the Kubeseal command from my local and passing the controller namespace, controller name, seal, secrets, controller. And the format is YAML, right? And now I am generating a YAML.

Host: Okay, so it sort of reads from your secrets file and does the encryption and then gives you a new file.

Divyanshu: Correct. So if you will see now the username password is encrypted.

And this was my other previous YAML. So now it has created a new YAML for me. Okay, makes sense now.

Host: Okay, now I'm curious how a pod will access it. Let me show you that.

Divyanshu: So this time I have created a secret here again. And now if I'll try to basically if I'll do a cube edit, right? Cubectl edit.

So you will see I'm using cubectl to get what is running in that pod currently. So you will see it is in the base 64, right? So my pod will see this file. If you'll see this is like

Host: okay, so at runtime it is sort of the same thing, correct?

Divyanshu: Because I'm doing a cubectl command, let me just close it. So because I'm doing a Kubectl edit so this request will go to our API server, right? And from there it will go somewhere in the node or do whatever it wants. So because we have a custom resource definition, so it is trying to edit the secret in the namespace secret. So this is kind of a runtime, right? Because this thing is already running so by default it will do the controller will do the decryption for me and it will show me because we have already installed the custom resource definition correct on the cluster, correct. So that is how our application would see it.

But if I'll have that YAML, you will see this is the seal it's all encrypted, yes. If you'll see, this is the secret which I have applied actually, right? So that is how we can store it. So it is solving the problem of storing these YAML, right, with the hard coded values.

If organizations do not want to use any vault or any kind of key management system, in those cases, for the beginning itself, they can use these values. Like if they want to go into the advanced phase, they want to have a highly available cluster with highly available HashiCorp vault or any other management system, they want

Host: Kms or something like that,

Divyanshu: correct. So in those cases actually they have to maintain everything, right? So organizations are little bit hesitant in the beginning to go through or have the vault. Like this is not the best practices which startups or companies follow.

Like if it is a very mature MNC or the company, in those cases they definitely follow this. But if it's a startup or a newly created company or if something suppose I am creating a project, right? In those cases I might be not using vault, right?

Because then I have to keep my vault running all the time, correct? So that might not be the case. Or like if I want to create an open source project and want to upload it or maybe create something internally for the organization and I don't want to have the vault. So in those cases I can simply use these values and add these YAML and even the sealed secret YAML into the cluster and whenever someone will deploy these, it will deploy the controller and at the same time this will generate the values for me. And now I can even I'll tell to DevOps that I want to use the shield secret and I want to store my values in the GitHub.

So it again depends how the organization is structured. But this is just a way of storing the secret in the encrypted way.

Host: Yeah. So I see two advantages of it. One is, which you clearly highlighted, right? Even though you check it into your GitHub, nobody knows exactly what are the credentials. The other part is the abstraction. Like as a user of the secret, I don't have to do anything, I'm just accessing it as if I'm accessing it like plain text secret, like Kubernetes secret.

One follow up question I have is now we are using sort of a third party solution, even though it is open source.

What about the security of that, at least for Kubeseal, because it must be managing the private key, public key, all of that, right? How is that secure?

Divyanshu: I am not sure.

I have not implemented this in the production ever.

I've used it for my personal stuff, like for creating something, for learning something. So I have scanned YAML and YAML was having the necessary permissions which were required. But I'm not sure about the binary or the image. I am hoping basically because it is Bitnami, which is used for the helm also, I'm assuming. But I haven't performed any scanning of the image itself. So image might be vulnerable if that is not updated. If they have updated a new image, right? Like controller YAML, if they have an updated version of image in those cases, this might be vulnerable to attacks.

So that can happen. Right?

Host: So I'm not sure about this can be an exercise for our audience

Divyanshu: so that they can play with like run a scan on the bitnami images and see if there are any vulnerabilities. So by that, in those cases, like even Cubeventure, cube Hunter also are not scanned. So maybe we have to perform the scan right, for those images also. So in these cases, because they are highly started, at least I haven't used these in the production because I'm obviously using the paid tools and I'm getting the Pen test reports and all these from the vendor itself. But for the testing or for the production, if we are using any open source tool, then we need to have a complete idea of what that tool is doing, whether it is vulnerable or not. So it is more we have to perform the scan on the images also.

And then we have to look whether a backdoor is there or not, because vulnerabilities, like Cubebench and also vulnerabilities can be there, right? Because it is not updated every day. So that can be there. But if it is having some kind of backdoor or some malicious thing which gives maybe a reverse shell, in those cases, this is extremely dangerous.

Host: Correct. Okay. No, it makes sense. Let's go to the next topic, I think.

Next one. Sure, go ahead.

Divyanshu: Yeah, so in cases basically where we have cluster monitoring setup via any anomaly detection, maybe like I was talking about calico also in those cases, we would be seeing some external communication happening if we have some malicious pod but if those things are not there, it is a new cluster, then we should be extremely careful. And that is a very nice question by the way, because I have done that like whenever I'll do any open source project I'll install in the prod or even in the staging, I'll definitely go through a check because I have not used much of them in the production. Like this field Secret, I've never used Cubebench and Cube enter I've used I did a basic sanity. I checked the YAML. I basically used Checko to perform a scan on those YAMLs.

So I checked those values and I saw when these images were updated and I just did a trivia scan, okay? And I just keep an eye on the hash of what shard 256 is there. So these things I'll match.

But if we want to use something explicitly for in our environment, we can have a cosign which is used to sign the container images and then we can upload those values in the private repo so that we don't have to rely on the public repositories this docker run and all, correct?

Host:Yeah, that makes sense.

Divyanshu: So next is the Kyvern admission controller. So, admission controller is basically anything which performs the validation and checks what kind of pods or services or whatever it is getting created in the cluster. If I want to suppose create a deployment without a label and I want to keep in check, or if I want to have a policy with some XYZ name, or if I want to want to deploy all things in the XYZ namespace, or if I want to block deployment in the default namespaces, right? So all these things I can do with the admission controller so it will perform the request once it reaches the API server. There are custom CRDs which are installed and from there either a validating webhook or a mutating webhook will trigger based on the policy which I've created.

Like if I want to block something then it will perform a blocking, right? It will enforce it. But if I want to change values like someone like as an engineer, like I am creating a pod. So if my label is cloudnix, then the label should automatically change to cloudnix underscore environment or cloudnix underscore staging or cloudnix underscore no prod. So these mutations I can have via the admission controller. So OPA gatekeeper is one open source controller and Kyverno is another.

So Kyverno I prefer because it is having library with huge YAML where the policies are provided. So you can easily go and learn.

And with the OPA gatekeeper you have to learn the rego which is little bit complicated. So if you are starting it for the first time, like when I started learning OPA, it took me some time to pick up the policies, and then in the production when I had to do like 50 or 60 policies, right? In that case. It was extremely difficult for me to create those policies because I was starting and I have to go through rego, then I have to test, and I was trying to bypass those policies as well. So for me it was a lot of work.

So I switched to keep Kaivarno, and Kaivarno was having the policies already present in the if you Google, you will get a library and there they had all sorts of policies. So it was easy for me to copy, paste and then change the values which I required.

Host: Okay, so in case of OPA, you have to learn the rego language and use that, so that becomes correct.

Divyanshu: So OPA is much more powerful than the Kyverno. Like, Kyverno is mostly around the Kubernetes IPA gatekeeper can do much more things. Like if you'll see AWS labs, like their workshop, they use Kyvano for other things also. Sorry, OPA gatekeeper for other things also. So OPA gatekeeper is just one part of the OPA where it is having the admission controller functionality.

Host: That's it. Right, okay, so makes sense.

Divyanshu: We'll talk about validation, mutation and generation. So I have already explained, I'll reshoot this same which I was telling you will see http API handling, then authentication and authorization in the Cluster. Then whatever the Mutating admission, if that is there, it will happen, or if nothing, this Mutating admission is not there, then object schema validation would happen.

And post that you will see validation admission is checked. Right? And then those things are stored in the ad set, like the pod was created successfully or not,

so Kyverno is also having a similar structure where we have a generate controller and the policy controller and we'll deploy the custom CRDs. So every tool, they have to have some kind of pod running, right?

So that there is some brain running which will do the abilities and this who will basically have the webhook running, which can be used to perform the mutations or validations, whatever we want.

This is pretty much about the Kyverno. I'm not going into the policies much. I'll explain in a very Leman language that whatever rule we have it will check whether it matches or not. And then it will exclude in the case of matching and in the case of Mutating, first it checks, then validates. And if validation is successful it will generate the updated value of that whatever mutation we have written which I'll show in the lab in the next setup...

So here in this part, I'll first deploy the Kyverno via Helm. So I'm not sure if Helm is think okay,

Host: maybe you're not in the right directory.

Divyanshu: No, actually this yes, I haven't opened the new tab. Okay, so now it should be working fine. I don't think I have Helm, so I'll also show how to install Helm. I have steps ready. Yeah.

I'll quickly go into my lab, because this lab I have created yesterday. So helm might not be there. So it is just a one liner. You can get it from the official Helm website. Also, I'm just using from my lab. So this is the script and I'm just scurling the script which is provided and it will from the GitHub and I'll just run it via Bash. So that is pretty much it.

And it will install Helm for me. So you'll see, it is verifying the checksum it has installed into the user local bin.

Host: So you have helm now and we are installing the Kyverno.

Divyanshu: So I'll just use the above command..

I'll use this repo add to add the Kyverno repo. Okay.

And then I'll use Helm install command to install it. I think I went yeah, so it has been added. Now I'll do a helm install. So Replica Count is here set to one. Because this is a staging sorry, testing cluster if it was so you're just setting one part correct. So if you will see a warning also that setting the admission controller replica count below three means Kyverno is not running in the High Availability mode,

Host: right? But since it is a testing one so we are okay. But in production, Kyverno recommends that you have at least three Replicas..

Divyanshu: Correct.. So it recommends but we have to decide based on the traffic and based on the communication, right? If we have suppose thousands of deployments happening, then we might need more Replica set.

This is for any CRT. This is for the bitnami. Also, again, High Availability is where you have too much of traffic, right?

Because we are staging and we are testing it so we can do whatever we want. But in real world, we actually have to deploy all these things and test first whether I am able to do it or not or whether we have to deploy the complete application using this Kaivarno. And we have to check whether my application is working as intended. Is there any time difference or is there my application is taking too much time? So all these things we need to check there. This is mostly the part of DevOps and I have included this because I wanted everyone to know how this will work.

Right? Right. So as an attacker, like if I have got the access of the Ruster, I can use this Helm to install the basically I can use the Bash to install the helm and then I can create my own charts and deploy in the cluster, right? This is another way of deploying the pods or the deployments within the cluster.

So, next is the basics, like how the policies are created. So there are multiple types of policy. First is the enforced policy which checks for the label, whether label is present or not or the label is required. That is mandatory.

Host: Okay. So any workload that you want to deploy has to have labels. Otherwise those will fail in a way.

Divyanshu: So you'll see, it is the validating webhook which is working. So first I'll do get validating webhook. You will see there are a couple of webhooks running cybernode. Validating webhook is zero. Policy validation is there. Exception validation is there and cleanup is there. Cleanup.

Okay, so now I'll see what are the Mutating webhooks. So this walkthrough is also about showing what kind of webhooks are present which are by default installed. So verify mutating webhook and the policy mutating webhook which is present, right?

Host: Why does it say that webhooks zero for one of them?

Divyanshu: So that might not be installed by default when we are using helm because helm would have certain default criteria, right? Right. Okay, so now I'll just quickly explain this. YAML require underscore label. So we'll skip all this part like the Kind is cluster policy. And then this is the annotations which is like your definition or description.

Then here comes the important part which is the validation failure action which is your enforce. So you'll see, I'm enforcing it in the background is true. And I have mentioned that label deployment is required. And here I'm using the reject. So with the label deployment, if anything is not there, right? Then basically it should check and it would fail the deployment. So I'll quickly create this policy. And I'll apply this policy.

Host: And again, similar to the Vietnam one, this is a custom definition. So you have to apply it in your cluster so that it can do the enforcement at runtime.

Divyanshu: Correct.. So now if you'll see the cluster policy we have required, label background is true and the validation action is enforced. Enforced. So we are not doing an audit or in the count mode like web, we want to have the logs, right? We don't want to block anything in those cases. We'll have audit mode here.

So when I'll try to perform the Kyverno run command this time I'll just clean my so you'll see, like it is saying the validation error, the label development is required and it was blocked due to the following policy.

So now you can see, it is very easy for me. Now, if I'll pass the label this time, I'm not using YAML, I'm directly running the pod here. So you'll see, I passed the Hyphen F A label development equal to yes, you can pass anything. You can pass no anything. Like I just want to have the development equal to as we have created, right, in our policy. In the policy,. We were using wildcard question mark star.

If we have used explicitly yes, also like if we have hard coded that value, then it would take only yes, correct.

Host: So in a way, this is very similar to how in our cloud infrastructure we say that, hey, you have to follow tags when you deploy resources. So here you are saying that you have to have a default label development label before you deploy. So you can customize those what exactly you want to enforce for, but yeah, this is one of the examples. Makes sense.

Divyanshu: So the next is I'll just delete this cluster policy because I want to create the audit policy. So rest, everything would be same, almost like I'll explain what things are changing.

So validation failure action is this time audit. So now I'm auditing, it won't block, it will just put it in the logs like I can see in the logs. Okay. And then the pattern is using default namespace is not allowed. And I'm checking like the default namespace is not true. That's it. And then this is the message and the pattern.

Host: Okay, so this is like a warning of something rather than error.

Divyanshu: You will see the kind I have mentioned, the Demon Set deployment job and Stateful Set. These are the ways by which we deploy pod, right? So you'll see, these are the kinds which it has mentioned. So if I'll try to deploy via these methods.

Host: so it will block and if there is no namespace, then it will fail. Not fail, sorry. It will show you warning. It will not even show me the warning.

Divyanshu: Let me just show I'll first apply this one, then I'll create the pod. So it is applied and now I'll create a pod. So you'll see, it has created it won't show us anything. But when I'll see the logs for that, I am installing the Yq, which is for the YAML. Okay? And I'll quickly check which policy, the name of the policy and all. I think there is no report.

It should be there.

Host: Sorry, we installed the policy, right? Or maybe we applied the policy. Yeah, we did. I think we have applied. Yeah, we applied.

We haven't installed any pod or anything like that. After that,

Divyanshu: I think it took some time. So it might not be there in the like it took some time to set up those policies. See, it was installing in all the namespaces. The warning might be there. I have ignored it..

Host: So I see there are four warnings. Yeah, I see four warnings now.

So I fail.

Divyanshu: I can see in the default warning is not there.

Host: Oh yeah, a warning is not there.

Divyanshu: There are four failures, but zero warning.

Host: Do you want to run it again? Maybe just to see?

Divyanshu: So I think the data is there. If you'll see validation error, using default namespace is not allowed.

So I think it has picked from all the pods which were running. You will see we did not delete the cube bench, right? Correct. The Kaibarno NGINX pod. So I think, see, the NGINX audit pod we ran, so it is just showing for all the values and the result it says is fail, not warning. So in your earlier report also there were four failures. So I guess it matches. Yes.

Okay, now I'll delete this and I'll go to the last one which is the mutation part,

Host: Right? Yeah. I'm interested like what type of use cases mutations will be used because you are changing it at Runtime, right? Correct. Like you don't have a trace of that in your, let's say, source code or anywhere it's getting updated at Runtime.

Divyanshu: This time, nothing is there. Like here, if you'll see we had something of this validation failure action, right? Because we are not doing any kind of validation. So that is why there is no such thing. I'm just adding the rule name is add label. And then in the Mutate section, I am just saying that patch strategic Merge metadata label FUBAR. So I'm just extremely simple language that wherever label is there, just add a FUBAR label. Okay.

So this basically got it. Yeah. Please.

Host: Sorry. In a way, you are applying some of the let's say at level you have some labels that you enforce for all of your clusters or all of your workloads. Then you can define policies so that developers, even though they might have forgotten you, have those labels applied..Okay, makes sense.

Divyanshu: At broader level, we can have a match and then Mutate label also. So this I got from the Kyverno library itself. So that is why I just changed the label for the labs. We can have much more advanced and granular policies where we are checking which pod it is or maybe the development. If the label is development, then Mutate or change it to the fail or something like that. Right? Okay.

Host: So you can use any selector that Kubernetes supports, right? Could be label, could be pod related, anything. And then apply your mutations on top of correct. So I forgot to create this add label YAML. So I've created it and I'll reapply this basically run for the first time and it will create the new mutation policy for me..

And this time when I'll run so you will see that I'm running a python pod and this time I'm not adding any label. So when I'll do a grep poop, basically I'm just showing quickly to you will see the label has been added who equal to bar, right? So this was added because we had a mutation policy, correct?

Host: Yeah, makes sense.

Divyanshu: These are the extremely simple example. They are much more complicated. So when I was creating the CTF lab, I have blocked the exact I have blocked default namespace is blocked if someone is using node name for the beer clusters. If you'll mention that I want to deploy my pod into the master node. Right? So that is also possible. So I wanted to block all those things in the CTF, I wanted to have a specific path. So for those cases, I've explicitly created these YAMLs like Kyvarno YAMLs from the library which I have got because I am not very much expert in creating these YAMLs.

Like as I told you earlier, I was learning rego. I was doing OPA then immediately I have to shift to the Kyverno. So there were a couple of things, right? So I saw that this is easy just to change the name, the variables and then use it for the personal use or like anywhere. If I want to use, most of the policies are present. So then you can just change the value and start using it and later you can go to the more advanced policies. Right? Once you have basic security in place, you can start using more advanced policies, right?

Host: Yeah. So for our audience, this is more like a quick start, right? How you can get started with Iverno and what are the benefits, how can you use it for defending your cluster?

Divyanshu: So even these policies, they are extremely important.

Like default namespace is not allowed. That is one of the check, right? Adding the label, that should be mandatory, right, if we want to segregate our resources.

And Celium actually works on the label, like it uses label heavily. So instead of using the IP address or the service, you can directly mention the label. Like all those pods with label, maybe NGINX should not access the label Python or maybe Apache.

So there the label is extremely required. So if you don't have labels there, you won't be able to enforce this fabric or like these network policies via

Host: okay, okay, so labels are like the foundation in a way which you should have.

Okay?

Divyanshu: Correct. So now we'll jump to the Celium very quickly. selium is a tool which uses Evpf which is extended Berkeley packet filtering. So it filters packet on the kernel level to be extremely in the layman language. So it basically injects bytecode into the kernel and it tries to changes the network packet or it monitors or checks how the system calls are made.

And because it is using this using Evpf at the kernel level, we can mostly use it for any network policy or anything we want. Because now we have the raw packet.No ability to monitor the raw packet or to capture the raw packet, right? Right, yeah.

So selium also works in a similar way. I won't go into the details of how Celium works there also we have a Seleum demon which is the brain of Celium and it communicates with the Celium CLI. Like, we have Kubectl for communicating to the Kube API server. Similarly, we have Celium CLI.

If you remember from the beginning, I used Celium install to install the Celium into the cluster.. So Celium is basically it provides a CLI.

Then here also we can have plugins and Celium monitor which is the hubble UI. And in the backend you'll see the Celium demon is actually injecting the bytecode on the ethernet level also on the container level also. So everywhere, because it is able to access the kernel light, because it is accessing the kernel. So it is able to do the injection at very last point and it can make changes or it can block the network traffic at that..

So there is an agent of Celium which is running on each node, which basically in charge of the rules and creating the connection with the Kube API server. Then there is an operator which manages all the tasks in the entire cluster. It won't take decisions about the network policy. It is for the installation or the upgradation of the Celium on each node or whether the Celium API server is working as expected or not.

Host:So it's management of Celium

Divyanshu: CLI is our CLI which is used to communicate to the Celium. And then CNI plugin is basically the brain or the main part which will run our Celium API server, which will basically communicate to API server, which will basically when we'll use Celium CLI, this is where we'll go and connect. So it is kind of like plugin or you can say CRD because not exactly it is, but it is like because we are installing it on the top of that and we have a network. Right. Until as we don't have networking, our Kubernetes will stop working. Like it can work.

Even if I'll create a pod, they won't be in the running state.

So that is why we call it as a plugin because it is configuring the network interfaces within the cluster.

As I mentioned, Hubble is the networking observability platform. And as I told you, you can see how this policies or how the Cilium policy which you have created are working. So you can use it.

Host: Sorry, hubble will be hosted inside the cluster, right? Or do you have to use like Enterprise?

Divyanshu: No, no, it is free of cost.

Host: Okay. All right. So it's part of the cluster itself, the way you have like Kubernetes dashboard, let's say, if you enable it. Okay, correct. All right.

Divyanshu: So then we'll directly move into the basics of Cilium because that is a pretty big hands on lab.

So I'll first explain the scenario in a simpler way. I'll try to be as simple as possible. So I'm just skipping all these things.

Yeah, I'm not explaining all this. I'll directly jump and I'll see. Yeah. So I just wanted to have this thing. So there is I am not much into this Star Wars, so I don't know about these analogies. But people who know, they might be able to relate better. But because students would come from different backgrounds, so everyone, especially here, people are not aware of Star Wars.

So in that case, I'll just give explanation that there is the Death Star Imperial Tie Fighter and Ribbel X Wing. So these are the pods,

I would say, which these are the microservices which are running. Let's assume that these are the microservices and then actually they are the microservices here in this example. Then the next is the label. So there is one label which is empire. Then there is another label class which is Death Star. Okay, so Death Star is one microservice where no, for all the pods which are running this microservice, they have equal to Empire like label One and then class equal to Death Star, which is label Two.

There are two labels.. And then there is Imperial Tie Fighter Service. Or you will say a pod which is having all the pods within this microservice with the Tie Fighter will have equal to Empire same, but the class is Tie Fighter.

so they have two labels.org is similar in Tie Fighter and Death Star. But the class is different because like the microservice itself is different. Then there is Rebel X wing.

So for X wing they have equal to alliance and the class is X Wing. So class is what kind of microservices it is and is like where they belong as the broader level. So you'll see Death Star and Tie Fighter. They are from the Empire. Rebel. X wing is opposite to them like alliance. So you can just assume that X Wing is no attacker to the Death Star and Imperial Tie Fighter, right.

That would be against both of them. So the same has been mentioned that deployment also includes a Death Star service which load balances traffic to all pods with label Empire and class Death Star, right, which we have discussed till now.

So I'll create this demo application first. Okay, let me see if I require something, I'll open a new terminal. Yeah, because no, there are different directories, so I don't want to have so it will create these deployments for me, you will see. It will create a service Death Star. There is deployment and a service right, for the Death Star because Death Star service is used to load balance.

And then there is this pod, Tie Fighter, and there this pod, pod XV. Yeah..

And I think there was a folder as well. I'll just quickly move to this folder. Yeah, let's do that. Yeah, because I've already applied those things. So I'll just check whether the demons at Celium is running or not.

So you'll see, demon set Celium successfully rolled. So I have to roll out.

Like I have to tell that I'll be using Celium from now onwards, okay. As a demon set. So that is why I am just using this command and I'm checking whether status is there or not. Then I'll apply or create the demo application. This is the YAML provided by the Celium team itself. So I'm not making any changes.

And then Kubectl, get Pod and SVC just to check what pod services are running.

Yes. So you'll see, there is Pod, Death Star, one, two, two pods. And there is one single pod with the typewriter and X Wing name,. And we'll ignore other pods for now. I haven't deleted that. So that is why it is present. Yeah, that is okay.

Host: Yeah. I see one Kyberno which is failing. But that was part of the previous exercise.

Divyanshu: Now we'll see this is the Seleum endpoints which are getting created. So selium endpoints are nothing but basically how Celium will connect with these things.

So you'll see, there is this code DNS. Then there is this X wing type which is running on which port.

So Celium is kind of creating mapping kind of a thing within the cluster. Yes, correct, within the cluster. So now as we have seen the name from the perspective of Death Star service, only ships with the label equal to Empire are allowed to connect and request the landing., so I'll go back to this.

So there are Tie Fighter and Rebel X Wing and Death Star. Correct. So only pod or only basically only pod with equal to Empire should be able to access this Death Star microservice. This rebel X wing should not be allowed. Let's see what is happening. So I'll do a curl because they all are microservice.

We have just given them the name. So I'll do a curl here. So you'll see it is saying ship landed. So I'm hitting

Host: so you could connect to that.

Divyanshu: Correct? Right. So Tie fighter pod is able to access the Death Star API microservice. Okay.

Because was empire in the case of Tie Fighter. Now I try to access the X wing also. So you can see that here also the ship is landed, which is not the intended thing which we wanted, but because right now there is no network policy here. Yes, it is possible to access the Death Star service. So now attacker can access this as well. Correct. So I'm not going into the description of that.

As I've told you, we want to restrict the alliance or this bad ships X Wing to access the Death Star. So we'll have something similar where I'll say description restrict Death Star access to Empire ships and then match label equal to Empire and class it Death Star.

So I'll say that only with the Empire label and the class Death Star like it should be allowed. So I'll quickly show this policy and I'll open this and I'll explain the policy as well. Sure.

Host: And I think this is similar to what we did earlier, right, where we looked at how we can restrict communication between ports.

Divyanshu: So we are doing something, sir, it is port 80 and the protocol is TCP, right? And the label is Empire.

So I am saying to the level Empire in the class Death Star that is the name of the microservice. Like if you'll see there is a way of writing the network policy or the CVM policy, whatever is below, right from there the traffic is going to the top part. So you'll see, I'm saying that match label and Empire and the cloud desktop which is our microservice and then ingress Match label Empire.

That means all the label services or pods with the label, empire should be able only able to do an ingress or able to access Empire and class Death Star, right? Correct.

Host: Okay. That's like the destination in a way, right? Correct. So that's the destination and this is the source in a way, how you are restricting it.

Divyanshu: Correct. And then the two port is port 80 and the TCP. So this is already applied. Okay.

Host: So there are two types of restrictions we have applied, right. One is anything which is not part of that Empire will not be able to access.

The other is, even though you are in Empire and you are not using TCP and port 80, then also you will not be able to access.

Divyanshu: Correct. So usually these things I'll give to the students to test, like whether if they'll change the port, whether they are able to access. Right. So if you'll see here, this typewriter is able to access because they had this equal to Fire label. Correct.

Now we'll again try to do Via XV, which were the alliance, and they were the bad part. So you'll see, it would get stuck.

The cull won't work. So we have successfully blocked. Now, the next part of this lab was that we don't know that because there are thousands of Tie Fighter planes or like thousands of pods running with our Tie Fighter microservice or pod with the is equal to Empire.

But their class is not Death Star, right. They can access. So suppose as an attacker, I created a pod with the name of basically this same label. Yes, sorry.

With Tie Fighter Label, right? Type Fighter was having the two labels. Let me go to the top and re explain it.

So you'll see, Tie Fighter was having.org Empire and the class.

So these can be dynamic or ephemeral pods which anyone can deploy. Let's suppose that.. And Death Star is our main microservice because anyone can deploy the Tie Fighter. So as an attacker, I can deploy a Tie Fighter with an.org equal to Empire.

So now the next part will come. Okay.

I was able to access, I was able to land. That is very good. That is not an issue because everyone is allowed with the Empire, whether it is which we spy or not. But there was a hidden API which was present in the Death Star.

That hidden API causes denial of service, right. It crashes the microservice. So you'll see, when I did a curl on this exhaust port, it is saying panic, sorry, death Star exploded, which shows that our application failed or someone attacked it.

Right. So now, I don't want my Tie Fighter planes to access this specific endpoint. I want them to

Host: so this is more granular, correct. Network policy in a way.

Divyanshu: Correct. So now I'll mention that method post and the path is V one request landing. So you'll see how simple the rule here is, right? I'm just again mentioning port.

80 protocol and what kind of rule I want. So if I want to create another policy, I'll just replace these values with the label I want and the values I want. Right.

Sorry. I think I copied. So there was this policy was already there, So I'll create the new policy and we have already seen this policy. And now I'll try to access this exhaust port which was the restricted and now you'll see the access is denied.,

so this completes our Celium. Now, as I always do, I'll just delete these policies.

So this completes all the critical things where we talked about the Celium, where we talked about Kyberno and we have the secrets. And the last thing I'll explain, or I'll talk about the istio service mesh.

Host: Before we go into istio, I think what I saw with Psyllium is very powerful, right. It gives you more granular ways to control traffic between services, between pods, based on different selectors, correct. Even at an L 4 level you can define. So yeah, makes a lot of sense and I'm pretty sure our audience will find a lot of value out of it.

Divyanshu: So, the last part is I would talk about a service mesh. I won't go deep into this, but I'll explain the idea and why we need this. No, maybe for the folks who are getting started in the security, they can explore more like Kubernetes security, to be very specific.

So when we talk about microservice, right,

I have seen people worried about how to make sure that pod-to-pod communication is authenticated or like there is an SSL between the two microservices running. That is a major question which is coming from all the ways, like whoever is running their microservices, what we can do. So if they'll create an authentication and authorization mechanism for the microservices itself and they will put that code in those same microservice, then their business logic would work along with have to do more action, more work to make sure that communication is also working fine between the microservices.

And microservices will communicate a lot. Right. There will be huge traffic. So what we can do, we can have this called as service mesh or the proxy, which is a sidecar.

So when we did Kubectl get pods, right… So there was only one pod running. You will see only one pod was running. So what we'll do, we'll enable this inject this sidecar, we'll say that I want to inject the sidecar. So sidecar is nothing. There is another container which will run with the same pod. So in this pod now you will have two pods running like for each within the same yes, within the same pod.

And inside that you will have one pod for the microservice or whatever service you want to run. And there would be a proxy pod as well. Okay, so this proxy pod will help in the communication. This proxy pod will help you in the SSL termination, SSL handshake and everything.

This will help you in the authentication of basically the communication which is happening between the microservices. So this sidecar will take care of all these things and they have this SPO agent running which communicates to the Spod control plane. Again, the same concept that there is one SPO D one which is having all the access and these agent will communicate to that control plane and they will communicate.

Host: So in a way you are abstracting all the work which is not core to the microservice, to a proxy, correct. So that developers can focus on only implementing their own logic, right?

Divyanshu: Correct. So now as a developer, they don't have to worry about the authentication. They'll create that application with the business logic and DevOps. Or like we can go and deploy the sidecar and that will take care of all the things. So this is the next lab where we are deploying the istio and then we are deploying the application. I'm not going into that because there is nothing much to see.

Like you won't be seeing anything, right. If you'll see the output, you will only have something similar, right. If you'll see there is 2x2 pod which I told you there is a sidecar there and we'll check no, it will also help in the ingress gateway, how the traffic is communicated.

Because earlier the traffic would directly hit your microservices, right? But in this case, now the traffic would be forwarded by sto.

So we have to tell the Kubernetes cluster that my service is ingress gateway, this Sto is my main service and you have to hit send everything to ingress this sto gateway.

Host: And then that would do the routing. Correct? That would do the routing for us.

Makes sense.

Divyanshu: So this is pretty much it. There was nothing more. So I'm not just going inside that. I just want to show one last thing. Yes, again, I won't get inside. I'll explain this Kiali dashboard.

So, because we have installed the istio, so there is Dons or the extensions, as we are always talking about the extensions. So there is similar set of extensions provided by this istio, which is Kiali, Prometheus, Grafana and tracing.

So all these things, they are there. So it is present in the default directory which we have or like if we'll download so in the samples in the down, they have all these pods or services, everything is there. You just to go and Kubectl apply.

If you want to deploy something specific like Grafana, then you have to select those YAMLs only. So it depends, right? Because it is a staging or testing cluster. Then I'll deploy everything here. I am explicitly deploying the Kiali service.

So it is an UI which helps in the monitoring. As I'm always talking like how we can do the monitoring, how we can do the monitoring. This is how it is. So once you have the istio because sidecar is taking care of everything, all the traffic and networking, right?

So now we can use this dashboard to look at how the traffic is flowing. So if there would be a blocker in the traffic or from microservice one, traffic is not going to microservice two and because it is managed by Sidecar so Sidecar would immediately send everything to this Kiali dashboard in the UI we will be able to see that it is blocked or it is in red color allowed…

correct. So usually this was the lab, it would look something similar and you will have a graph with the flow also.

So if someone will keep hitting microservice one and it is communicating to microservice two internally, it will show all green, it will show idle state also and all those things which nodes are like pods are idle.

So all these things, this will come in the Kiali. So this brings us to the end of Kubernetes defense. I'm not going into more deep like this is for a beginner level workshop, this is more than that. So we'll stick..

Host: yeah, this in itself has a lot to learn, correct? Starting from your secrets management to your network policies, all of that, I personally have learned quite a few things as part of the exercise. So thank you for doing

Divyanshu: even I'll do that right, I'll think in a new way. Okay, this like the policies, network policy, that pattern.

So I knew this is working, but I never explained in that way. So for me also it was okay, I can explain in this way that it would be easy because if you are reading from top to bottom, then for the first timer it would be difficult to understand, right? Like from where to where the flow is.

But if I'll explain that you have to read it from bottom to top, then this becomes like source to destination becomes much easy. So for me also, it is every time I'll get a new way of explaining things. Okay, yeah, no, absolutely, that's understandable.

Host: So I have few questions before we close the session. So one is the last time when we looked at the attack side, right, we had a remote code execution issue and you could exploit it and take over the cluster.

So as let's say blue team or a team trying to defend it, what options do I have to close that gap? Is it the code scanning? Is it the cluster scanning? What can I do to address that gap?

Divyanshu: So there would be multiple ways of doing it. I'll start from the code scanning or from the external first, I'll talk about the external point and once we'll reach the application, I'll talk about the code scanning part.

So from the external point, we should have a firewall in place so that it blocks the basic attacks. Like if we have some cloud-based firewall, I'm not naming the vendor because firewalls can be bypassed easily. So if you have some next-gen firewalls or specifically the cloud flare, then it will block more attacks. Correct. That is capable of doing much more things.

So that would block most of the attacks. This rivershell or XSS or RC or SSRF, if that is not there, bypasses can happen. If attacker would bypass, the next step would be if you remember I tried to get a reverse shell or try to run the commands, I tried to check for different binaries, right?

Or I tried to do the env or those things. So if they are not required, we can explicitly block those commands. Like env is a component which anyway we can't block much, but access to etc passwd or the log file or like the cat binary, those things we can remove. Like we can have a image which is extremely hardened, right, where we don't have any redundant binary like curl and debuget is not present or like a python three is not present. We need to keep only things which are required.

But anyway, it would leave a gap that it would be running some language like it can be Java, it can be Python, it can be Go so that language would be there. Correct. Hardening it to the level that only that part of the jar is running in case of Java or that folder is running is extremely difficult. So what we can do, we can instead block the temp folder where the attacker would try to create the shell and try to run that command because they have to create a file and then they'll try to do a python.

They'll directly try to run a command even without saving. Also in those cases we can block on the network level, like on the VPC level, like outbound is always open. I've seen most of the organization they have Outbound open because they think that is not an issue.

But in case of reversals, for me that was easy. Like whenever I would do an internal red team assessment because outbound is open, so I'll take a reversal onto my local and then for me it is very easy, right? Correct. I'm not naming the services like if I'll name everyone can try to do pen test on those services and without permission that is not healthy because if you'll do those on production environments, then anything can go wrong.

So those services like let's assume XYZ services and you know that you can run commands of those services. It can be any functionality or specific to DevOps team which they have created for some key adding purpose like teams would create different different tools. So right, if we have something similar and you are inside the container because outbound is always open in those cases.

So I'll create an easy to and I'll take a reversal because there getting a public IP is much easier than getting onto my local because in my local I have to port forward and everything, right? So there I'll directly do a public IP and I'll get access to my personal EC two instance.

So because outbound at zero zero is open, so we can block that, then we can have a specific port open like four 4380 because applications would be running only to those specific ports. Although network exactly flow would happen like the inbound to their outbound can happen via any port.

So we can't block all the things, but we can have a restriction, right? It is not always recommended to block and restrict everything. Some level of something would eventually block the attack. Like even then if attacker was able to get the reverse shell, like if I'll remove the basic commands like curl I will remove or like these commands I will remove or my user basically my quad is not running as root. In that case I don't have access to Temp because we have already blocked it.

Then my container is running as nonroot or it is not having capabilities.

Then I won't be able to use appt update, I won't be able to install anything, I won't be able to download anything because Kerl and all is not there.

Even if I'll download like suppose I have downloaded somehow I was able to do maybe base 64 I'll use like if I'll talk about the obfuscation, I'll use base 64 and I'll copy paste. So if base 64 binary itself is not there, I can't do anything,

Host: correct.. Exactly.

Divyanshu: Curl and Wget is also not there. We might be using Python Three or Python or Java in those cases to have that, but it would make things difficult for script KD. For any attacker they have to think like they have to stay for that long in that cluster or in that pod, right?

So which will make detection easier? If we have SIM solution or we are logging EKS, then by that time it would be detected, right? Because they will try so many things in that cluster, so by default they would be forced to get detected. There is no way they will get undetected because we have hardened it so much.

Plus even if they have got access to one pod, we have to make sure that they don't have access to secrets or services or S three buckets or they have specific permissions attached with that no containers. So for that also they need something like they need curl or they need AWS CLI, right?

They will need some way to connect. They will need Python Three maybe to install the AWS CLI like something some more permanent..

Host: So the more restrictive your environment is set up is environment is like in terms of hardened image or blocking paths or not allowing processes that helps from outside in. What about inside?

Let's say you have access to the code. What can you do for remote code execution?

Divyanshu: For inside, I won't need RC for much because I anyway have access.

But in that case, also as an insider, what I'll do, I'll try to do an exec into that pod, right? Because I have access, or even if I'm in one pod, I'll try to do an exec into in another pod. So I can use a Kyverno to block exec command itself, like docker exec or Kubectl exec is command. As you saw when we were running, we were using exec to run that command.

So if I'll block that command itself, it will block everything, right? Even if I have access to X container, I won't be able to execute inside the Y or Z containers. So it would restrict. And if you'll combine all these things, right, namespace blocking, exec blocking, then network blocking. So now you'll see how restrictive all these things are.

So at some level you can have Kyverno itself to protect from this exact or maybe attacker because he is not having permission to deploy anything. Or the user is like a nonroot user.

So attacker can try to deploy another pod. So even if I'll block pod deployment also so those things via Kubectl or those things like if it is getting into a default namespace because if that pod is not having permission to get namespace then, attacker would try to brute force what namespaces are present exactly or where I can deploy like the system namespace which are present by default. Also, if we will block those also, then again it would be more restrictive, right? This is something where you have to see how your cluster is created and what you can do to bypass those things.

Like I have seen my cluster and then I started bypassing whatever you are asking. Then I created the Kyvarno policy, then implemented in the staging and then tried to bypass again till the point I was not able to bypass.

Correct. So this is how and then last part is your code scanning. So for code scanning, definitely we can try any code scanner which will give us the basic idea if you want to go for any free and if paid is there, they will have your flow sync and source. Also how the flow of data is happening. But most of the vulnerable functions are very common, right?

OS system open, p open. So all these things like at least I know about Python, so they are very easy to figure out.. Yeah, so they are easily detected.

Host: So one thing that you highlighted is the permissions, right? So in the earlier attack episode, we saw that we had a policy defined where the permission was star, star, which is not recommended, right?

Again, even on the cloud infrastructure, that is not recommended. We always follow least privilege and all of that. Correct. How would you do that in a Kubernetes cluster setup?

Divyanshu: So I was coming to that. So once we have code and everything sorted. Last part would be your infrared deployment thing because Infra is getting deployed so it would be RBACs, your services and all those things like YAML part.

And the second would be your TerraForm part which will basically create the AWS environment or GCP environment for running that cluster.

We have to make sure they are not vulnerable. Like if I'll drop everything, all those capabilities basically we have to add those drop functionalities non root in the YAML itself., it won't be on the container level. So we have to scan those YAMLs and find out automatically. Like via Tfsec for TerraForm and chekhov via automated whenever the code is pushed. Via GitHub actions, maybe.

And check what? Kind of misconfigurations are present in these YAMLs and same is for the RBAC but for the RBAC we have to keep it specific to RBAC like Checko will scan everything we should make sure that we are explicitly doing an audit of RBAC.

Plus, some organizations use third party tools like Rancher or something to manage these RBACs from the UI. So there also we need to check those tools from which allows these access, whether that x-user is having permission or not, or from the UI, if they are managing, like, whichever ways they are using first we have to collect those ways as an internal security engineer to find out. Maybe like, permissions. What everything like? How the cluster access is given. Is it the certificate based?

Or like everyone using the admin keys? Or like the config is shared between each user. There is no specific user based user. Or is it the machine user? Like Ubuntu or something like Easy to instance or some similar users are there which are not human users.

so we have to keep check on the RBAC part explicitly auditing that plus the YAML part and the TerraForm part or like, whichever way, like if Ansible is used, then you are deploying how the deployment is happening.

And then last part is the monitoring, which is also extremely important, though very costly. So to monitor the logs you need to push generate the logs of the cluster and then push it into the SIM solution or any LK stack, which is very costly.

So instead of that, we can have this falco or threat mapper or any runtime security. Tool. If people want to go for the paid one, to have those things and basically to have that in the runtime, it would run similarly to how the Kyverno or the Celium were running within the cluster. You don't. Want to run as an agent or as a sidecar, then we can have one single pod running in the cluster which will scan and which will do all kinds of activities.

And it will also keep an eye it. Will give you all the data which is or how the communication is happening.

So it also has some pods. When they are created by YAML, they will ask for more permissions. Like they will try to have whole structure, they will try to basically get the kernel level access because they try to inject runtime security and they will block etc passwd command or those kind of commands running within the cluster. But they will need explicit permissions to do that. So you have to make sure that you are getting those permissions, you are reviewing those permissions, third party tools before you are deploying it because memory issue can be one.

And the second is if they are having some external permission and tomorrow the tool or that company itself is hacked. Right, correct. Vulnerable, then you become exactly. Okay, that makes sense.

Host: So we covered three things, right? Yeah, we covered from an outside perspective, what are the things like firewall, hard end images, all of those. And then you can do code scan infra security, like RBAC scans and then monitoring aspect runtime monitoring aspects. So yeah, that's a great way to sort of end the episode.

Right?

We covered all the attack vectors. Now we covered few of the defense vectors as well. So yeah. Thank you so much Divyanshu, for coming to the show and sharing your knowledge around Kubernetes, both from an attacker perspective and also from a defense perspective.

Divyanshu: Thank you. I'm stopping the screen share first. Yeah, thank you.

Host: And to our viewers, thank you for watching. Hope you learned something new. If you have any questions, share those@scaletozero.com and we'll get those answered by an expert in the security space.

Divyanshu: And please share. Thank you so much back. Also, if you have like I might be very fast because of the time and I know that I'm working on it, but due to obviously time constraint, I was moving in a faster way. So if we have more time, we can have maybe more sessions.

And I just want to add, I'm giving one training, so there also I'm not naming it here, but in that training you guys can come.

If someone is really interested, then it would be a three-day long training and there you can have better idea about how these things work and we'll go through each of these functionalities or YAMLs separately.

And then it would be like a larger lab where we'll check, we'll try to create, we'll do more things, but we'll have more time. That is

Host: makes sense. So this is more of an introduction to Kubernetes and the attack and defense side, right? Yeah, the workshop makes absolutely a lot of sense where folks can go in detail and you can guide them as pretty. Thank you so much for joining again and thank you everyone.

Attacking and Defending Kubernetes Cluster In-Depth
Securing your Kubernetes Clusters was never easy. Here we come with a practical understanding a Kubernetes Cluster. Watch now!

Divyanshu: Thank you.

Pushatam for having me. Bye bye.

Host: Thank you Divanshu for this amazing workshop. There were quite a few things to learn for me. I hope our audience will love and learn something from this workshop. With this, we conclude our workshop. If you have any questions around security. Share those at scaletoseero.com and we'll get those answered by an expert in the security space. See you in our next episode.

Thank you.

Divyanshu Shukla:  https://www.linkedin.com/in/iamdivyanshu/

Get the latest episodes directly in your inbox