Kubernetes Security And Misconfigurations With Jimmy Mesta

Host: Hi, everyone. Thanks for tuning into our Scale to Zero show. I am Purusottam, Co-founder and CTO of Cloudanix. Scale to Zero is a forum where we invite security experts to learn about their journey, discuss security topics, and get answers to questions we have received from curious security professionals. Our goal is to build a community where we learn about security together and leave no questions, security questions unanswered. With that, let’s get into today’s episode. Today’s episode is focused on Kubernetes and Kubernetes security in general.

So Kubernetes is becoming like the de facto container orchestration platform, and there are many areas in Kubernetes as well, like code, container clusters, cloud, and with all of these components, it brings a unique set of challenges for security professionals.

To discuss further on this topic, I’m super excited to invite Jimmy Mesta to today’s episode. So Jimmy is the co-founder and CTO of KSOC, and prior to KSOC, Jimmy held several senior leadership positions at a number of enterprises, including Signal Sciences, which was later acquired by Fastly, where he led a team of researchers and engineers. He has spent time on both the offensive and the defensive side of the industry, working to build modern and developer friendly security solutions. Jimmy, it’s wonderful to have you here in the show. For our viewers who may not know you, we want to briefly share about your journey.

Jimmy: Sure thing. Yeah, thanks for having me Puru.

It’s a pleasure to be here. So my security journey has been kind of a long and winding one. I’ve been working in application and infrastructure security for about 15 years and ultimately led to founding a company, KSOC, which were focused on Kubernetes security. But I really started working with containers a long time ago, in the early days of Docker and Kubernetes and kind of still just in the thick of it. And, yeah, I’ve been a pen tester. I’ve worked in compliance, I’ve worked inside of data centers, doing things, so a little bit of everything, and really exciting to be a founder and all the things that come with that. So I’m sure we’ll discuss that today, as you know.

Host: Absolutely, yeah. I’m looking forward to the discussion. So the way we do this is we have two sections. The first one focuses on security questions. The second one is the Rapid file section. So let’s start with the security questions. Right?

So in terms of Kubernetes for today’s users, there are two ways to deploy Kubernetes. Either if you are using a hyperscaler, let’s say, EKS for AWS, GKE for Google, or EKS for Azure, or you do self-hosting. So maybe let’s start with the cloud provider-managed offering. Right?

So when we think about cloud, generally, all the clouds have a shared responsibility model, and that comes to mind when it comes to infrastructure, does that apply to Kubernetes as well?

Jimmy: It does. It absolutely does. So there’s lots of great documentation on the greater public cloud. Shared responsibility model.

It’s a little different for every public cloud, but at a high level, you’re responsible for what’s running inside of the cloud and they’re responsible for the cloud. Right. So everything from underlying networking, hardware, physical security, and that absolutely manifests itself into manage Kubernetes. So any of these managed platforms, EKS inside of AWS or AKS inside of Azure these days look somewhat similar in that your responsibility for bootstrapping a cluster and running what is the control plane or the components that make up the Kubernetes API server SCD where all the configuration data is stored? Even as far as logging options from the Kubernetes API, there are some things that the cloud is going to give you an interface for to use. Right. You don’t have a lot of freedom to change certain configurations on kind of the management interface, the control plane of that cluster. So with that, the Shared responsibility model does apply.

Your cloud is going to take care of that control plane to some degree and it’s then up to you to run workloads in a way that is meeting or exceeding your security standards that your organization has set forth. Right. So it’s really not a lot different than running easy to virtual machines, RDS databases, things of that nature. And you can still get yourself in trouble inside of the managed offering from a security standpoint, but there are guardrails there for you to make things a little more easy from an operational standpoint.

Host: Okay, that makes sense. So a follow-up question to that is when it comes to the shared Responsibility model, either organizations are unsure of what is covered, what is not covered, and they take certain things for granted as well. Right.

So in terms of Kubernetes and Kubernetes security, what should be kept in mind that, hey, this is covered, this is not covered? I think you touched on that a little bit that control plane and HCD is out of fuel control, right. That is managed by cloud. Anything else that should be kept in mind?

Jimmy: Yeah. So it’s a good question. So the easiest one is your workloads. Right. The containers that ultimately make up a deployment or the underlying networking that you have set up for those workloads, the secrets management is on you. Yeah. AWS offers features and products that help you deal with secret management, but you have to choose to use them correctly.

The base images that are kind of behind these workloads, that’s on you to ensure they’re up to par with your security standards. Again. The RBAC. Right. Identity is really important. Who has access to the cluster? How are you provisioning that access? Role-based access control configurations. Right. The granularity of the verbs and objects that these individual users and service accounts have the ability to kind of manage the cluster and the components within it. The dashboards themselves, the logs, do you have them turned on and are you using them in their entirety and even as far as some of the node configurations. Right. Just because EKS gives you a nice template to create a cluster, that doesn’t mean that the node that you created or the pool of nodes is in the right VPC. It doesn’t mean that maybe you don’t want your Kubernetes API exposed to the Internet. Like EKS will support that, but it’s up to you to turn it off or change it if that’s not part of your threat model. Right.

Host: So I think there’s a lot more that’s responsible the responsibility lies on the team managing it than AWS, and that’s the misconception. Right. Like an ECAs cluster is not some hardened area to run containers by default, right? Yeah. Those are some of the important areas that you have highlighted. Right. Like your workload secrets are back and all of that, and configuring the node pool properly, the network set up and stuff like that. So that’s a good segue to the next area that I want to talk about is the misconfiguration, right.

Similar to cloud misconfiguration and its impact on security, kubernetes also is vulnerability to misconfigurations.

What would you recommend so that the users can avoid these misconfiguration issues?
What best practices should be followed so that configurations are set to the best possible setup?

Jimmy: Yeah, it’s hard right. To set the stage. Kubernetes is highly ephemeral. Right. So it’s not really doing a static scan of your cluster at a point in time is useful, but that can change within an instant right after that scan has been completed, so you already have a very dynamic and almost hostile environment. Right. Things are coming and going. Lots of workloads are churning. So to wrap your head around reducing the number of misconfigurations, there are a couple of places that you should probably consider checking for misconfigurations. Right. Starting at the manifest itself. Right. We see CICB sort of static analysis tools that run and look at your YAML manifest before they even touch a cluster to find some of the low-hanging fruit. That’s very useful, but it’s not enough. And the next step would be your admission control right before the workload or the configuration is about to be scheduled onto the cluster. Let’s ensure that it meets our baseline standard, and I’ll talk about some options for that in a second and then run time. Right. So after everything is scheduled, your Pod or your Damon set, whatever it may be, is up and running.

How is it interacting with the rest of the cluster? Is it still configured as you expected it to be? So because of those three steps, it’s really hard, right? Because if you pick one, you’ll have a little bit of coverage, but you kind of need all three. And what to look for as far as misconfiguration, we have some baselines. The CIS benchmark is probably the de facto, and it looks for common misconfigurations across workloads. Your Kubernetes nodes do your containers within your pods run a particular user like Root? Are you mounting file systems that maybe lead to other types of vulnerabilities? Are you running privileged pods, things of that nature and discovering those? That’s important? It’s an important part of the process. The NSA hardening guideline (https://www.nsa.gov/Press-Room/News-Highlights/Article/Article/2716980/nsa-cisa-release-kubernetes-hardening-guidance/)  also exists to help guide operators to running the most hardened version of a cluster that they can. And the OWASP top ten for Kubernetes (https://owasp.org/www-project-kubernetes-top-ten/) , which I’ve been grateful enough to work on and with a few people, that is really the practitioner’s checklist, right? It’s kind of the high level guide, if you will, of what to look for across your entire Kubernetes estate. So it’s up to you to choose what to check for.

But I do recommend doing a little more than just kind of firing off a scan once a week and hoping for the best because it doesn’t have actually an accurate representation of your cluster in its current state.

Host: Okay, makes sense. So I like two things that you highlighted. One is coverage, right? Similar to how we write unit test, we look for test coverage, higher test coverage. It’s more like security misconfiguration coverage for your cluster. The higher the better. You cannot just say that, hey, I covered one area and I’m good. The higher the coverage, you are in a better position. And the second thing that you highlighted was the checklist, right? Maybe for a beginner, they may not know where to start or what to look for. Right. So following either CIS, NSA or OWASP Ten where you were part of it majority, so maybe that would be a good place to start.

One of the things that we have seen is that when it comes to misconfiguration scans, right, a lot of people use open source tools and for example, cube bench is an amazing tool to check for CIS benchmarks. But in a cloud managed Kubernetes set up, some of the components cannot be scanned, right? Like, let’s say your HCD or your control plane because those are managed by the cloud provider. So in that case,

How should practitioners work with or work around this? And if we need to order them, let’s say we spoke about CIS NSA and OWASP top ten and there may be a few other things as well. How would you order them so that folks can start CIS first, NSA next? So what would you recommend in that case?

Jimmy: Yeah, it’s just a totally valid point, right. If you followed the CIS benchmark line for line, it wouldn’t work. First of all, you couldn’t check for some of those things in different types of cloud providers. And second of all, even if you were building your own clusters from scratch, to have a 100% passing coverage of the CIS would be really hard and maybe even not possible in some circumstances. So I think these frameworks often in the security world, our first instinct is to be like give me a tool I just want to scan the thing and give me the output, as you know. And it’s harder than that, especially with Kubernetes. If you don’t have a team that has the underlying knowledge of what these checks are doing and why they’re important, you’re going to be chasing your tail for things that maybe don’t matter.

And what I would recommend is use your open source tools. That’s all fine when you want to get a really quick assessment of an individual cluster. So the platform we’re building and have delivered to our customers today is really for the enterprise. I have 50 clusters, three business units, and I have these clusters running in four different locations. Right. And that becomes even harder because a CLI tool doesn’t give you a representation of your posture. Right.

So I would recommend starting with misconfigurations that you care about, ones that your security team has deemed. You have a collection of things that you know could present problems, you understand the impact. You have a clear path to remediation, kind of strip out the rest of the noise and just take it in bitesized chunks. Right. Like doing the entire CIS just means you’ll probably do none of the CIS in my experience. Right. It’s an overwhelming endeavor.

Exactly. Large enterprise.

Host: Exactly! Makes sense. So that’s very good advice because it helps maybe look at, from a threat modeling perspective, which ones are applicable. Maybe not all of the items from the CIS or Oars or NSA apply to you. Right. So that’s very valuable.

So I want to pivot to the other way of using Kubernetes. Right. Rolling your own or self hosted. So this might be useful when you are trying to learn about different components of Kubernetes or how they work with each other and stuff like that,

In your understanding, how is it different from the cloud provided cloud managed Kubernetes from a security perspective? And we spoke about a few best practices in a cloud managed Kubernetes. What would you recommend for a self hosted environment?

Jimmy: Yeah, it is pretty different. Right.

And when I started using Kubernetes, there was no manage Kubernetes. So you did it yourself. And usually we have Chaops or Cops and some flavor of like cube adm, but at the end of the day, you were creating the nodes, you were dealing with the certificate authorities.

Every step was the hard way. As Kelsey Hightower put it, the opportunity for misconfiguration to go unnoticed. When you’re setting things up in a very bespoke manner like that, it’s just amplified. Right. And you think about EKS. It’s like the build versus buy model. Like EKS will not give you fine grained access and configurability to your control plane, but they also are going to surface just enough for you to get your job done and keep those guardrails in place.

So when you’re building your own cluster, which a lot of people do, we have government entities, we have customers that build very large and successful Kubernetes clusters and maintain them.

You’re signing up for an internal skill set that maybe you didn’t need if you were using managed Kubernetes, right? You need a proper platform team. You need folks to understand how to perform upgrades safely. You need to know the individual flags and configuration options for each version of Kubernetes that you need to turn on or off to give you the best chance of being secure. So, on the flip side, you could probably save money. You could probably scale in ways that maybe you couldn’t manage Kubernetes.

You can run Kubernetes in IoT devices and do different things that Manage Kubernetes isn’t great at. But you are signing up for a serious level of operational overhead, and that includes more security control than maybe you realize if you’re running your own.

Host: Makes sense. So you are sort of not paying to the cloud providers, but you have to take up that on your own from a security perspective, operational perspective, and this has different set of challenges. Makes sense.

Jimmy: HCT, right? HCT has a number of configuration flags and all of this. It’s a whole world that if you use Manage Kubernetes, you don’t even know you’re using STD doesn’t matter. Right? Exactly. But it matters a lot when you’re building your own cluster

Host: And not just HCT. Right. That is your control plane, API servers, all of that. You have to take care of that.

Jimmy: All of it.

Host: Yeah, all of it. Makes sense. So, I want to talk about runtime security a little bit.

So, there was a recent study by Armo where it highlighted that more than 70% of the clusters are using open-source libraries for container security and runtime security, and they use more than three solutions on average. What’s your take on it? And do you have any recommendations in terms of what an organization is looking for in open-source libraries, what should they start with?

Jimmy: Yeah, this statistic, it doesn’t surprise me. I think a lot of people use open source to scan clusters for various things, whether it’s an image scanner or using open policy agent Gatekeeper. Lots of different things out there to help you find bad things, right? My take is I have yet to see a real enterprise program wrapped in exclusively those tools because it’s the output, it’s pretty noisy, and you need to really do more than just run an individual scan on one cluster in your environment. Right? Like, we have customers that have over 150 in production clusters. So the central management of your security program, vulnerability management, triage everything that feeds into the rest of a real security program. I think these tools can be automated in ways that are helpful for that, but they also can be a sea of noise that might not represent an enterprise security program.

My take is people are desperate to secure Kubernetes and they’re looking for things to do. It actually before I would recommend any tool. I would actually recommend that more people learn Kubernetes deeper internally that before they start firing off tools, because it’s really hard to sift through these results if you don’t have the context right. So I think there’s a lot of training that’s underpinning these programs that’s missing, and the whole needs to level up in container technology in general before I can recommend an individual tool. Obviously, if you’re building big environments, I would say come talk to me and my team, but that’s a shameless plug?

Host: No, but you are spot on when it comes to your evaluation of open source. You get so much information. If you do not have good knowledge of Kubernetes and the security area right of it, then it becomes very overwhelming and you don’t know how to prioritize is it even applicable to me? So sometimes it becomes challenging. So, yeah, that makes a lot of sense.

In general, like Kubernetes, as we have been speaking so far, it’s a very complex area, right,

Any learning resources that you go to to keep yourself up to date, or any open source resources that folks should look around or look for to get more understanding of Kubernetes in general and also Kubernetes security?

Jimmy: Yeah. For me, my role has evolved to be a little different as a founder, as you can imagine. So when I’m learning about Kubernetes, it’s building on top of it. And what I mean by that is basically we have hired an engineering team that is just so excellent at what they do and how they think about Kubernetes. They’ve managed the biggest clusters in the world. They’ve built on top of Kubernetes and had to deal with real-world kind of Kubernetes implementation and I’m learning from them most of the time, which is not a good recommendation because they don’t have a book that you can read. But the platform teams get this stuff right.

Sometimes in security we decide to just stay and only talk to pen testers, and it’s great to attack Kubernetes and I love that, but there’s a whole world of platform engineering, and they’re dealing with Kubernetes in ways that security teams won’t even touch. So I would say use it, play with it, break it. And the OWASP top ten was made for the first entry point into learning that’s the only reason it was even created in the first place was, what are the ten things that you care about right now? And that’s going to lead you on a path to learn more things? Right, so there’s that. There’s the OWASP kubernetes security cheat sheet in the K Sock Labs GitHub, public github.org. We maintain an awesome Kubernetes security list of just random stuff that we find interesting.

I think that’s good kubernetes.io will surface pretty much anything. It is vast and it’s full of good security stuff. Right. So it’s not just docs. There’s a whole RBAC best practices right up now. There’s tons of security content just on the docs.

You can also hit me up and I can send things as well. But yeah, books are hard. I don’t have time to read Kubernetes books typically at this moment, and they go out of date in about five minutes. So I just keep it going on Twitter and the social media channels and follow the right people, essentially, which is we have that in the awesome Kubernetes GitHub repo. Some people to follow.

Host:Yeah, we’ll make sure to tag those when we publish this episode. But I love how you started with your team.Like, from a learning perspective, you learn from your team, and that’s a very unique perspective, and I really love that!

Jimmy: We’re in engineering off-site in London right now, and my mind is just fried. It’s just the depth of a true platform team, and we’ve hired the best Kubernetes folks on the planet. And it’s amazing learning something every day.

Host:That’s a good segue to my last question. So you are a founder in a security space, right. And in general, being a founder, it’s a very difficult and lonely journey. So,

How can let’s say my admin use that information to either allow or restrict me to access data in AWS so that we are not sort of exposing our data to outside world all the time and at the same time I am not blocked to like again, it goes back to the dev velocity, right? I’m not moving slow, rather it should not impact me at all. So how will data perimeter help me in this case?

Syed: Excellent question. Again, this is one tough net to crack and the pandemic forced us, a lot of us to do this on the fly. And this is an interesting challenge because obviously there is a whole zero trust element to this which is like what am I trusting? Am I trusting networks or am I trusting devices and identities? I’m on the firm believer that you should not trust. “Your trust should be handled not based on proximity as in being on the same network only, but also enhanced by your identity about who you are and how you come”. So using those things, like having, in this case, a good onion metaphor with all the different layers for this, in my mind, which of them would be obviously they would be using, say, company devices that have the right amount of software and the correct certificate so they can connect securely. And you’re using Https and benefiting from that. But then secondly would be like using MFA to log into their identity provider that then gives them access shortlived credentials to go into AWS. So now you raise the bar suddenly like, okay, one, I’m not giving you persistent access, only giving you temporary access and I have a higher level of identity verification required for those customers that I want to take even more. Like, as I said, proximity and identity approach. They can even use conditions like source IP to say, hey, first my developers would have to go to my corporate VPN, log into the corporate VPN, and then only can they access AWS. And then they could put conditions in their policies to say that, I will allow you to assume this role, but only if you’re coming from a source IP that I trust.

These are approaches, these are again plethora of options that are available to our customers to do what they’re trying to achieve. And personally this is me personally speaking, I believe that identity centric and device centric approach is a stronger approach versus purely using VPNs and using network as your policy making your policy enforcement choices. So I think stronger MFA and that’s why it’s like one of the first identity best practices we have is like using an IDP and using an MFA to ensure that the people who are you can authenticate people on who they are based on multiple factors.

Host: I want to pivot a little bit to organizations setting up their data security on a continuous manner, right? So most organizations when they go through the business transformation process focus on initial setup of the cloud and from a security perspective they set up guard duty or security hub. And let’s say they get notified when something goes wrong and they address that. So that is more of a reactive, those are more reactive controls. Right?

What are some of the lessons learned while building KSOC, and what advice do you have for future founders?

Jimmy: Yeah, that’s probably a whole other episode. I know. Yeah, the high level. I was a security engineer, so I had certain skills that I’ve acquired over the years. Technically, I know what a good security program looks like. I would say if you want to start a company, VC, back, bootstrap, whatever, you better learn how to surround yourself with people who aren’t like you and have different skills, because there’s nothing worse than an echo chamber thinking you’re doing the right thing when you’re not. And you will be rejected 95% of the time for everything that you do. And you just have to be in a constant state of waking up the next day and doing it over again for that 5%. That works. And I mean, some founders maybe will say that rejection is not part of it, but that can’t be true. I think it’s just you put lonely in there and it’s like, well, kind of is, because it presents challenges. You’re faced with decisions and challenges that only you and your co-founder, I guess, are dealing with at that moment.

So have a strong co-founder and surround yourself with great people and a partner. If you have one in life, that’s okay with this journey.

Host: Yeah, I can totally relate to it. And thank you for sharing that with us. That’s a good way to wrap up our security questions section.

Summary:

Here are a few points which stood out for me.

  1. As part of the shared responsibility model, cloud providers take care of a few areas like the control plane or data HCD, key value store and stuff like that. But there are areas like your own workload secrets, base images. There is RBAC dashboard and logs, which users need to be careful about. So that is the first one.
  2. Kubernetes misconfig checks should be a continuous process instead of a one time scan and done type of thing. There are many areas when it comes to misconfig like your manifest, your CI/CD pipeline, mission controls, run times, etc. So focus on coverage using the CIS benchmarks or NSA hardening guide or OWASP Top 10.
  3. With self-hosting. There are many operational and security challenges that come with it. Cloud providers are good at abstracting these security aspects from a managed perspective. For the managed pieces, if possible, avoid self-hosting and use a cloud managed K8s offering.
  4. For Kubernetes Security Security instead of just relying on open source understand Kubernetes and Kubernetes Security Security in details, follow a threat modeling based approach rather than a checklist based approach.

Rapid Fire:

Host: So let’s continue with the rapid fire section. So the first question is,

What advice would you give to your 24 year old self? And let’s say starting in security, and why would you do that?

Jimmy: Yeah, I would say to my 24 year old self, keep pursuing security. Right. Sometimes you question if you’re good enough because you’re surrounded with people who are really smart. Security is just a weird industry in a lot of ways. It’s full of great people, but it also has some gatekeeping problems that are very apparent. And it’s easy to have imposter syndrome. And I would just say keep going, don’t quit.

Like, don’t go into a cabin in the woods quite yet and run away from technology. And also to keep an open mind, because there is not one path in security. Like, there’s literally you can do AppSec, you can do cryptography, you can be some obscure academic expert on TLS, and you could spend ten years of your life researching these really cool corners of security and you’ll probably find something that interests you, right? It’s such a massive area. So yeah, that would be my recommendation. Just keep it, explore the area. Explore the area and find your calling in security. Right? Makes sense.

Host: The second question is, if you were a superhero of cyber security, which power would you choose to have in you?

Jimmy: I would choose empathy because everyone’s usually trying their hardest, everyone’s dealing with something, whether it’s at work or at home or personal, whatever. And the security jerk trope, it’s not useful. We’re all kind of on the same team, building stuff together at your company, doing whatever you’re doing.

I’ve been fortunate to work with really open, authentic and empathetic people. And when I encounter that security jerk, it’s like, I don’t have time for that. Right, so just like, whether it’s your users or the developers you’re trying to work with, we all stuff going on and it’s good to just keep open to that. And I think it’s not just in security. Empathy applies to any domain or any business. So that’s a very cool point.

I don’t know if it’s a superhero. You have this superpower, but it feels like a superpower sometimes.

Host: Yeah. Because most folks don’t practice that. Right. Not everybody practices that, rather so, yeah, definitely. The last question is,

what’s the biggest lie you have heard in cyber security?

Jimmy: Yeah. That you’re not elite unless you are dropping zero days or are putting out, like, crazy malware or some reverse engineering guru.

You probably have a place in security and meaningful contribution. So the biggest lie is that you need to be this high to ride. Everyone starts somewhere, and security is no different than any other career. Yeah. You’re not going to be perfect for a while or maybe ever, but you don’t need to drop zero days and put this mask on to be in security. It’s just silly. So there’s a lot of space for different skills and people.

Host: Yeah, I love that, because most folks think from that perspective right. That maybe I need to learn more. I need to be perfect before I can get out to the world, especially in security. Yeah, that’s a great point.

Thank you so much, Jimmy. This was very insightful for me, at least. I hope our viewers will learn something from this as well. So for folks who might have more questions or want to connect with you, what’s the best way to reach out to you?

Jimmy: Yeah, I’m still hanging out on Twitter. I don’t know if everyone’s leaving, but I’m Jim Mesta on Twitter. You can find me on LinkedIn really easily. I’m usually pretty responsive there or just Jimmy@ksoc.com, like, you could send me a message, however, yeah, fairly easy to find, and you shouldn’t have a hard time finding me. LinkedIn become my de facto, which I don’t know how I feel about that, but that’s where I’m at right now, is my journey.

Host: We’ll make sure to add those details so that folks can reach out to you if they have any questions. So thank you so much for coming to the show.

Jimmy: Anytime. Thanks for having me.

Host: Thank you. And to our viewers, thanks for watching. Hope you have learned something new. If you have any questions around security, share those at scaletozero.com. We’ll get those answers by an expert in the security space.

See you in the next episode. Thank you.

Jimmy: Bye.

Get the latest episodes directly in your inbox