IAM, AI & Cloud Security: Unlocking Scale & Battling New Threats with Stephen Kuenzli

TLDR;

  • One big misconception when it comes to security, or particularly in IAM, is that folks see security as a dedicated resource versus it's often a shared resource in the organization. And the other one is like least privilege is like very standard, very boilerplate. Whereas it's very nuanced depending on the organization.
  • The second one is security organization should enable teams by providing self-serve playbooks and also setting up processes around it. Of course, with guardrails on top of it, like using like data parameter in AWS.
  • AI has reasoning skills, which was missing in software development for decades. So leverage AI, MCPs, agents to improve not only research, but also decision making, and hence the productivity gain.

Transcript

Host: Hi everyone, this is Pushyutam and thanks for tuning into ScaleToZero podcast. Today's episode is with Stephen Kuenzli. Stephen leads K9 security, which helps cloud teams scale across governance within the workflows they already use. He also wrote effective IAM for AWS to help cloud engineers understand the challenges of AWS IAM and how to use IAM well at scale with modern practices like continuous delivery.

He loves great coffee, loves to run and loves to nerd out about simplifying complex things so that people can actually use them to achieve their goals.

Stephen, thank you so much for joining me today in the podcast.

Stephen: Yeah, well, thanks for inviting me, Peru. Appreciate the opportunity to chat with your audience and you.

Host: Absolutely. Looking forward to it. But before we start, anything you want to add to your journey? I'm not sure if I did justice to your journey.

Stephen: Yeah, no. Well, just a little bit of background, a little more background is, I spent a several years leading cloud migrations. And know, it's Docker and infrastructure code and continuous delivery and all this in the, you know, 2015, 2016, that kind of timeframe is when I was starting in on that really.

And the hardest part of it was I, and I was like, well, this seems like, you know, nobody knew who had access to what or Jenkins could delete the production database. So I thought that was a useful problem to go solve for cloud teams. So that's what brought me to this space.

Host: Interesting. Yeah, it is for sure one of the interesting ideas. So hopefully we'll touch on that today. Before we kick it off though into the security world, one of the things that I ask all of our guests and we get unique answers is, what does a day in your life look like?

Stephen: Yeah. So when the workday starts, You have to keep in mind I'm a mostly bootstrap founder. So I do a lot of things throughout the day. And the first thing I do is I plan my day in 30 minute blocks. It usually starts with some morning communications, check on the SaaS operations with the dashboards and so forth. Then we do a daily sales operations sort of standup thing. And then dive into deep work for my primary focus of the day, then have lunch, and then do the secondary priorities for the day.

And on any given day, could, you know, the focus could be product, it could be marketing, it could be sales. And then finally wrap up by checking and responding to email again, and then capturing my insights from throughout the day, from all the different functions I'm working in.

And I found that that's really helpful. By the time Friday rolls around, I usually can't remember what was happening on Monday, Tuesday. It feels like last week, right? But having a record of these things has really been valuable for going back and focusing on the good stuff and capturing insights and rolling them forward to the next week and doing continuous improvement. Host: So one of the things that I have heard, of course, there are debates about it. Some folks say that you should have the plan for tomorrow, tonight. And some folks say that I plan my day at the beginning of the day because I like there are debates around it.

Stephen: Right. So I'm not sure how you look at that. Yeah, for me, I will. So I plan my day out in on pick. So I have a cat. I of course have an electronic calendar, but the beginning of the day, I actually block out my day. I write it down and that helps me with a few things, really like literally making space for the tasks that I need to accomplish and recognizing it's a bin packing problem. You're like, well, this won't fit in the bin.

And that leads to, well, I'm going to push that to the next day sometimes. Or at the end of the day, I don't plan my day, but I will make a note for like, need to solve XYZ. I'll capture notes for like, and dump some context there or in code. Sometimes I'll be code broken intentionally. So I'm coming right back to the same place.

Host: Okay. Interesting. Right. So today's focus is around challenges and solutions of scaling, IAM security in cloud like AWS and Azure and TCP, let's say. and how AI and agent-assisted workflows plus MCP servers may change how security is done. So let's dive in!

So for many organizations, IAM is often seen as a foundational element. However, we cannot ignore the fact that scaling it effectively across large and dynamic cloud environments still is a challenge. So I want to understand from you, In your view, what are the top two, three misconceptions that organizations have when it comes to IAM and when it comes to what is sort of stopping them from scaling it the right way?

Stephen: Yeah. So I agree that organizations are continue to struggle with this. And I think the first misconception is that centralized security teams can be in the path of delivery teams without blocking those teams. The scale factors are all wrong. There's like 50 application engineers to one cloud security specialist, which may not even be in the security team proper, often or even generally not.

And the second misconception is, I think, is that, and the most important one is there's a strategic error in being too literal about least privilege. And I think this comes with sort of like often if you don't really know if you're not into policy writing you're not practically responsible for writing policies.

You can be very literal about, we're going to remove every last permission with CodeGolf. Like we're going to go play code CodeGolf and not really recognize the scale at which you need to manage policies. If there are hundreds of principles in your organization that you care about hundreds, maybe data sources that you need to care about. like, are you going to go artisanally craft policies for each of them? How will we do that?

And so, you know, we can talk about some solutions, but basically it's like, let's back up a little bit and really think about how we're going to scale this from a process perspective and with what people, and then let's go identify tools to help us do that.

Host: I think the second point that you highlight, both the points were spot on. I think, yeah, with shared security resources, it's often a challenge. Sometimes, DevOps teams are owning security, then you do not have dedicated security folks. So that's another challenge.

The literal usage of least privileges is a huge challenge I see as well. When you talk to any security leaders, they're like, yeah, we want to do least privilege. But it's not a generic implementation. It has to be custom to your own organization based on the scale you are at and things like that.

So now that you touched on both of these misconceptions, what's your recommendation? How do I fix it?

Stephen: I think that the first thing, going back to the first misconception, I think it makes sense to recognize that the security team or the lowercase security team, whomever the security minded folks are in the org, they need to come together and figure out like, what is our process gonna look like in terms of like integrating security into the delivery process in a way that is primarily self-serve? Like 95 % of security changes should essentially be self-serve from components that are built and vetted by the organization or for the organization.

So that the security specialists are not in the operational path of like reviewing changes before they go into production. Because in what they're going to find is that they're now respond if they are on that path, they're now responsible for two, three, five changes per day. And if they don't respond in a reasonable amount of time, like four, like four business hours, they're blocking delivery and they're going to be stressed. And the organization is going to be slowed down.

And it's also not necessary because most of these changes are going to be pretty sort of standard things. Like we're going to provision access for an application's data to be accessible by the application and some administrators, right? So let's codify these practices.

And when it comes to least privilege, it's like, okay! Do leaders when they think least privilege, are they thinking about, um, assigning privileges at the granularity of an individual permission of which there are, uh, what is it 16, 17,000? I forget what AWS is up to now, like all the major cloud providers are above 10,000 individual permissions at this point.

Or are they more interested in saying, know, providing like interacting with like an auditor level language here, which is can the principal administer the resource? Can they read its configuration? Can they read the data, write data, delete data? And so what is it that we mean by least privilege? It used to be that some people would say like, well, least privilege is this, like we… we only allow access to production for certain principles and like that's least privilege. Okay. Well, that's certainly one definition. And maybe that's a good place to start.

But you got to like define what you by least privilege and then make an appropriate business decision about like, well, maybe we should control access with words like administer resource and read data, write data. And have policy generators that use that sort of language to do what people mean and raise the level of the conversation.

Because otherwise, it is very difficult to get out of least privileged golf. you lose out on, if you can get to 80, or sorry, like 90 % of the solution in terms of reducing risk with a coarser abstraction. Maybe do that and then you can take the time you've saved and go play code golf with your most critical resources.

Host: Yeah, I think I like to give an example that maybe in production will restrict it to few principles. That is one way of looking at least privilege. It's not humanly possible to do it at a personal, like, sorry, permissions level, right? As you highlighted, like if there are over 10,000

Even if you are looking at for a particular service, there could be like hundreds of permissions, right? So it would be difficult for the security expert in your team to do it and do it at scale. If the person has to cater to 100 engineers, how will you get to it, right?

The first thing that you mentioned about self-serve, right? Do you have an example that you can share? are you thinking of like giving department or team owners some permission so that they can manage it for their team so that they don't go always to that central security team? If you can give an example of how to do self-serve.

Stephen: Yeah, for sure. So one of the things that we've helped teams adopt is we'll give you, we'll help you define reference architectures built on sound security patterns where we're helping you design security into your architecture with like say a data perimeter. We'll help you implement a data perimeter for your application data where each application gets a KMS key and is able to encrypt all the data related to that application with that key.

And then now you can control access to that data via the key policy. So even when you have applications running, multiple applications running in a single account, application A can't read applications B data just because they can, you know, they have S3 get object on the star because application, each application's data is encrypted with their key.

And, you know, we found with customers that so what they do is they'll build these sort of principles into their application frameworks and so forth so that when an application team is developing, they actually don't even really need to think about it. They just say like, well, we want S3 and we want DynamoDB. And it just so happens that underneath the covers, things are both encrypted. Those data resources are both encrypted with the application's key and it's like all together.

And, you know, and we, you know, they're in this case, they're using one of our policy generators, which are freely available on GitHub for CDK and Terraform to manage that key policy because it took an expert weeks to get that policy right. And it's done. It doesn't need to change the way that policy generator works.

And this isn't something that you should be implementing with every application. It's just like, here's a proven component. So one of the principles here is take proven components and build reference architectures out of them so that application teams can just adopt that stuff and go.

So this is stuff that we've talked about in the DevOps and cloud space for at least a decade. So we're just like, yeah, we're doing it with security.

Host: So that's a good example. So one of our past guests, Kushagra, who works at booking.com, he had mentioned something similar where they, like this is AWS. They are heavy AWS users, so I'm using some AWS problems, but this applies to all other clouds as well. Like they had organization and under that organization units and under that individual applications and they had delegated some of the rights to the application owners, the OU owners with guardrails put in, like with the data parameter that you highlighted, so that the individual owners can expedite or unblock their own teams when it comes to while building resources, doing it in the secure way. Only when it gets too complex or it's not as part of the reference architecture as you're highlighting, then it goes to a central security team and then they define a reference architecture.

So that way you are sort of, it's a good balance between having boundaries and at the same time enabling your teams to move fast.

Stephen: Yeah, there's another practice that we're sort of skirting around here is to have in the book, the final chapter is about like basically building and scaling a security practice and like one of the practices or a program across your organization. And one of the potential practices you may want to adopt is creating like a guild for security where the practitioners can collaborate and discuss problems, gaps, solutions, etc, etc, and provide help to others.

And that also will help scale the security team. just building a knowledge. Like a center of excellence. Yeah, a center of excellence. Yeah. Right, because there will be gaps, of course.

And one of the nice things is if your patterns are codified as code, not only can you select things off of the list in like Lego building block types style, but when something isn't available, you know straight away from a planning perspective that this isn't a two hour thing. It's probably closer to a two week thing. And that's really important from a planning point of view.

And it gives the security specialists in the world the ability to say like, that's a good idea and we're interested in doing it, but it helps them manage expectations that it's not going to be here tomorrow.

Host: One of the questions that we have received from Rowan Udell around IAM security, and I want to hear your thoughts on it. What is the first thing that you check when working with clients to improve and build out the IAM security? And also what are the biggest red flags that you see as well?

Stephen: So the first thing we check is who has IAM administrative access in an account. And when we say I am admin, it's, who can change policies, detach policies, create roles, etc. And the reason we do that is that we find that a lot of folks have three to five excess IAM administrators, even in production. And it's not necessarily a malicious thing, like somebody, you know, they've been breached or something.

It's that there was an incident a while ago and somebody provisioned additional permissions to get things working again. They got it working and then they never removed things. Those permissions or they were just accidentally applied in the first place. And now you've got application A which is running out on the edge, is taking internet traffic, it's an admin and it's running on an ECS cluster with other with application B and like, well, now you have a real issue, potential issue where if app A gets breached, it can pivot and get all kinds of other data if that other application's data isn't protected.

And like, I would say the biggest red flag that we see is a very, it's the common one of a whole bunch of stale API access keys that have been around. And they lead to breaches. you you want to do two things. One, try to move those to roles via, you know, whether it's a person coming in through an identity provider and SSO, or particularly for applications, you want to move those to roles. IAM roles running on the whatever the compute instances. And if you can't do that, you should really get into rotating them.

Host: Yeah. So on the first point, I love that you highlighted that it's not with bad intentions that there are multiple admins, but maybe someone was given admin, they were trying to figure something out and then nobody could hope that it has been lying there. There has been a lot of JIT platforms, just-in-time access platforms where they manage not only elevating but also revoking some of these permissions. yeah, only if everyone would have used a JIT solution.

I think AWS has an open source. I think they call it Teams or something, which gives you a framework of just-in-time, which can be leveraged as well. Yeah, there is something in Solutions Library now, I think, to do JIT.

And JIT, A Just In Time Access, is definitely a great practice that people are adopting in the industry. It's pretty hot right now. The thing we're looking forward to is all of those JIT systems having knowledge of what the entitlements are, the access entitlements are in the account so that when somebody is… when somebody or something, because it could be good automation, is reviewing the access request, that the review process is aware of what level of permissions the principal is gonna have and this can make a more informed decision.

Like yeah, this is giving admin and production or the ability to read the credit application data or whatever it is, right?

Host: So one question that comes to my mind, like you touched on least privilege and how it is not like a generic implementation of least privilege, right? We spoke about like common, maybe misunderstandings or how you interpret it.

What would you recommend to organizations to get better at their least privilege? Is it more around understanding your own environment, understanding your, like, IAM needs? Like, how would you go about it.

Stephen: Yeah, so I think that there's two sides of this. One is, as you mentioned, first understand what is happening in your environment right now. What is the current state? Because even if you have infrastructure code, that is what the state of the system should be. It's not necessarily reflecting what it is, especially because there are many organizations, probably the default, where not everything is automated, especially for people's access. Think it's more likely that you've got automation for application delivery pipelines.

But what you wanna do is like, I think the first thing is to just get a sense of who has access, who are your privileged users, who are your admins in your production accounts and who has access to your most critical data sources in your production accounts. And usually that will find enough immediately actionable things that you'll go off and work on resolving that for a while.

And then you'll probably I think you'll find that you're in a in a more useful place to discuss like, well, how do we bake great security into our application architectures so that we don't have to do like essentially remediation? that we can design it and bake it into the delivery process versus trying to fix things after deployment and stuff like that.

And especially with applications, you want your security policies to be delivered through the SDLC, through the application delivery process, because your security policies are just as much a part of the application definition in the cloud as the code is. Like if you don't have permissions to, like you want to test with the permissions that you are going to have in production.

And now that's actually, and I point that out because it's actually a little different. It's not just a little bit different. It's a lot different from how we provision access for people. So for applications, you want exactly the permissions that you're going to run in production in dev test, whatever stages you have, every stage.

But with people, we typically give different kinds of permissions, more permissions in, development. And then maybe you get production-like permissions in stage, but maybe it's easier to go and do one thing with, hopefully, just-in-time access kind of thing. Or maybe you have standing privileges.

And then in production, you have even fewer privileges typically, or that's the target. So it's important to understand what's happening in production and then start to work backwards to how do we make this an efficient delivery and management process.

Host: So let's say I go through the list privilege, I define some of these practices that in development, maybe you have elevated privileges, staging somewhat in between and production very limited. Are there any security tools which can help me do this or which can help me monitor? know that cloud providers provide some native services. If I just want to take an example, AWS provides access analyzer to see what role or the users have been doing in the last 90 days or something like that.

Similarly, can you think of any tools that cloud providers provide which can be leveraged to while you are scaling your IAM security?

Stephen: I think that from defining your intended permissions, think AWS Identity Center is a useful tool in the ability to manage permission sets which you can assign to, which are essentially a managed policy that gets projected out into individual roles for a job function like application engineer. And you can vary the permission set based on that's applied to a group by account.

So I think that's useful in terms of scaling people's access. In terms of understanding effective permissions, I think it's been pretty tough. you mentioned the... Which is why we built an access analyzer. This was the original... Remember, the original driving question for me was like, well, who can delete the production database? Because that shouldn't even be possible, right?

We don't want that possible. And you can prevent that with, say, service control policies. But how do you really know that you can't do it? We want something to give us positive confirmation and thus confidence that the database isn't deleteable.

You know, you can't, you know, if you want that sort of view, need an access analyzer that tells you what the effective permissions are, which is itself a very difficult problem. But yeah, you mentioned the used access analyzer. Now that can be useful for sure in terms of like generating on scoped down policies. But there's a nuance between understanding the effective permissions that policies grant and the ones that are used.

Well, the used access analyzer is going to tell you the ones that have been used recently, not whether it doesn't, it'll tell you whether you have deleted a database recently, not whether you could delete a database, right?

Host: True, true. And so- Like it's after the fact in a way, like based on your behavior, it is telling you what can be done or what, sorry, what has been done.

Stephen: That's right. Yeah. Yeah. As is any tool that analyzes an audit log, right? So, you know, both answers are useful in different contexts. But I think it's important to understand the nature of the answer that you're getting. I think that overall, when you have hyperscalers giving you 10,000 plus permission, people need really good tools to help them understand the state of the system and be able to navigate efficiently right? We've got a lot to do.

And so I think of what we do is basically we provide usability. Our product is usability for cloud systems. what we're trying to do is from a... So early in my career, I read Don Norman's Design of Everyday Things. And...

One of my driving questions here is how do we make security usable as an everyday thing and not like that scary remote control that you basically, it's got 45 buttons and you know how to use one.

Because when we want to go make something, when we go, when I go, have a goal of locking down access to say a data source, we have to understand what's available, like how do I actually apply different permissions? We have to establish a mental model of like, how is it gonna work overall? And we have to go do that.

And then we want to be able to verify that, did I reach my goal? Did I get there? Is the policy right? And so, you know, the way I think about this is like, if we're going to scale this process of managing permissions, we don't have, so it was recently, we don't have time necessarily to understand, for application engineers to understand that Amazon Bedrock has 188 permissions that apply to 29 different resource types. And like, how to dial those knobs.

Now, it's cool that all these knobs are available, but it's not necessarily the right abstraction for application engineers to be working with. They want to be able to express, well, this application, application A is going to have a knowledge base, a RAG knowledge base. And it's going to have, and so it should be able to read and write data from that knowledge base, and everybody else shouldn't, and stuff like this.

And so… In short, what we want to do is give the application engineers some component to be able to say, I want to read data and write data. This application should be able to read data from this thing over here. something else takes care of the details so that we can do that reliably and quickly.

Host: Yeah. I think I love how you connected this to the first answer that you gave. There are two misconceptions about least privilege. And also how do the other one was around like how do you not become a roadblock as a security team, rather enable your application owners or like OU owners so that they can self-serve and get unblocked themselves.

b Precisely. So a lot of, and just to build on that a little bit, a lot of organizations are trying to adopt AI and they're doing POCs if they haven't already, right? Right.

One of the things that is you want to pull out of that POC is, okay, what did our usage end up being here? So presumably in your POC, you've got some specialists who are involved in this and they're going to go do something. And the goal is to experiment and learn. It's not necessarily how to ship a production application, right?

But out of that experience, you should be able to say like, Well, this is what we actually ended up doing. And this is kind of the inner, these are the words we want other application engineers to be able to say. And let's provide like a simplified wrapper around that to basically as an infrastructure code library or whatever, that provisions that knowledge base and does the policy management around it.

And then give that to, somewhere between one and three teams to go try and iterate on that feedback and develop that component and iterate on it. And this is where the security guild comes back in. They should be talking about this stuff, right? And there's feedback. And pretty soon we have something that's available that's ready for 10 teams. let's iterate on it, right?

Host: So we have been speaking about IAM and how to scale and things like that. The second area that we wanted to touch on is around AI and how that impacts security and more importantly, IAM.

So what are some of the emerging trends and best practices are you noticing with the adoption of AI, like skyrocketing? So what are you noticing?

Stephen: Well, I think the promise I see is that that AI can help perform the analysis and research required to connect the dots between what are essentially siloed tools. And it can take information from Security Hub and JIRA and your CSPM and so forth and join it together into a cohesive and relevant story so that you can… then filter the most important stuff to the top in a consistent way, route it to the relevant team, and then that team then can prioritize that issue or those issues along with their existing work.

And then probably there are, you can make the agents aware of your solutions library and so forth. It can maybe help assist with recommending like, well, you might want to be, you know, apply, you know, this policy generator. It looks like this policy generator might lock down access to that knowledge base or that S3 bucket or whatever. So, and I think one of the key points here is that like, with the advent of MCP servers, so the model context protocol servers, you can pull in private information from security hub, from JIRA, whatever.

And this just becomes, and like pull together into like one workflow. So instead of like dumping this data into spreadsheets and like merging it in your head, now you've got, now we have the medium by which an agent can do much of the analysis work and research work and people can focus on consuming that information and digging deeper if necessary, but then deciding, making decisions.

You know, all this, like I think of, I think what the AI agents being able to do is like, there's like a new world of decision support coming.

Host: Another question that comes to mind since we're talking about AI and security, it definitely helps you fast-paced some of the work. What are some of the possible implications that you see on security teams? I'll just mention one thing. There is a lot of vibe coding happening nowadays. So that means there is a lot of code getting generated. So similarly, what are other...  implications that you see security teams have to go through with the new AI world.

Stephen: The white coding thing is a is an interesting use case. I'm not, you know, for me, I think there's probably still an expectation that code go through the same delivery processes, whether that's code review and all the different security checks and so forth. think, of course, the volume may go up.

And so this is going to put additional pressure on any manual processes or like any processes with limited scalability. So, you I do think that security teams should basically examine the security processes that their teams are supposed to be using, that the organization is supposed to be using, or like, well, what happens if we try to push two times, three times, five times as many changes through this, what's gonna happen.

So they can get ahead of that and start redesigning if necessary the processes so that they do scale better and they are more self-serve and that security specialists are not in the operations path. one obvious thing is that like, well, if people are out vibe coding software, like why don't we have agents that are reviewing these change requests and so forth.

And I think I suspect what is gonna happen there is that like there'll be a wave of agents that are like okay-ish at finding or like the big challenge is always signal to noise, right? in automated analysis solutions. And so, you know, I would definitely look to roll some of those out, but I would also POC them carefully and have security.

security, one of the things that we see happen, unfortunately, is security teams will say like, okay, we're gonna roll out this new tool and it's gonna create, it's gonna go find all these issues. And then the issues get stuffed in a JIRA or comments on the PR, pull requests or block or whatever. And now it's up to the application engineer to like go do something with that. And so I think that with the adoption of any new tool, one of the responsible ways to do it is for the security specialist to be in on that and be responsible for like, and feel the pain of a low signal and build some empathy there.

And then, know, start with one team, roll it out to three, then 10, etc, just like we did with the- That's how you scale. Yeah, the infrastructure library. But, you know, Ultimately, AI, these tools are going to be, I think, more, they're efficiency increases in the general case, think, or maybe not the only case, like in the general case. And so it is super useful to think about like, well, what processes are they making more efficient? so like, you know, what's going to have what are we going to run more work through? And do need something on the other side to be checking these additional PRs?

Host: You should not take shortcuts. I summarize it, you should follow the same SDLC process. know that, let's say, the code is getting generated a lot, but still you follow the same processes. The basics should be continued as is so that you have some guardrails or the guardrails are in place, you are not pushing vulnerable code, let's say to production and things like that.

Stephen: mean, organizations overall goals and policies for security delivery, secure delivery didn't change.

Host: Yeah. We spoke about AI. So one of the things that we cannot, we spoke about wipe coding a little bit as well, but we cannot end the episode without talking about MCP because that's the craze nowadays. Right. And recently you have shared a video of an MCP server interacting with security hub.

You have mentioned that how you have built and things like that. What specific challenges or limitations were you trying to address by building this MCP server? So that folks get an idea, right? That why I should build MCP server. What are some of the common use cases maybe that I should look at from a security perspective, of course, right?

Stephen: Yeah, the thing that led me to start with Security Hub and experimenting with MCP there is that we've been talking with consultants and security practitioners. And one of things that we've heard from people using Security Hub is that they want to use it. They know there's valuable information in there, but overall it's a mess. And so what they want to do is they want help filtering out the important issues and then sometimes just like closing the rest for now so that they can establish a baseline and maintain it.

So what I was interested in doing was seeing could we, know, one, how quickly could I build a little simple MCP server that pulls it issues out of security hub? And then if I asked the Claude in this case to identify the most important issues, the critical issues, I forget exactly the term I said, but what should I focus on to go fix? and it did a pretty good job. Very good job for like… like four hours of tinkering around.

I do think that the MCP world is still like, it's pretty, it is definitely maturing. Like most of that four hours is actually spent just like sort of like getting configurations, right? Like the code in this MCP is actually trivial, which is really interesting. But like getting all the dots connected. Like getting MCP server integrated with quad and credentials and stuff like that. Okay, that took a bit, but I thought the result and the interaction was awesome.

Host: I have a similar feeling as well. Since we're at the very early stage of MCP agents, agentic AI, it takes a little bit more time to do the setup right. And then the actual, let's say writing of the tools in the MCP servers is way less time consuming. But hopefully in longer term, the configuration, the setting of things will get much easier.

Now, one question that comes to my mind is, a follow up question on that is, now that more and more organizations start using AI, the agent assisted workflows with MCP servers and things like that, how do you see it impact or help the traditional security operations and security programs?

Stephen: Well, I think that it can help tremendously probably. We can tie together, we can integrate previously unintegrated data sources. So I mentioned the CSPM security hub, JIRA triad. Well, we can pull that information into one context and analyze it in either via like standard ways, you have standard prompts or standard tooling that's running against these things. Like, of course you can, you know, like I was reviewing an agent framework yesterday where it can pull from MCPs.

And so we can build little automation loops that do, that automate much of the research and analysis of all these issues or findings coming into security hub or whatever, go dedupe them from security hub itself or in JIRA, it can go like sync context up. So I think there's a tremendous opportunity to eliminate essentially undifferentiated heavy lifting and prepare a cohesive package of data and story for a person, an analyst to actually look at and say like, yeah, that actually is really important. Or like, you know, can you add data about the entitlement information or whatever? to this because I'm not sure this is really a problem.

And then it finds out that, oh yeah, this principle only has access to some other data. That the credit application state is not actually at risk. OK, we fine-tuned this, but a huge chunk of the research was already done. And if you have MCPs providing your data sources and you just add them in in a very quick way,

I think that's hugely valuable in terms of supporting good decision making. And I'm excited, you know, we want to help build that. I'm excited to like start doing some more of that. But, you know, we've definitely heard from larger customers that you know, they want to do a better job of around their security issues and particularly like this routing, like validating there's a real issue and then routing it to the person big problem.

And that's something that like the way work moves around in an organization is very organization specific. you, don't think you can solve it with general purpose SaaS, general purpose software, we actually need something that can sort of like reason or like, you know, that can apply rules without spending a whole lot of time developing and maintaining the rules.

Host: I think the key term that you used is region, right? Reasoning, like so far with the technology that we have been working with, that was the key piece which is missing now with these LLMs, there is reasoning capability that we can add. We can give it all the data sets and it can help us build the context which is needed. It can do, as you mentioned, it can do maybe the verification that whether it's actually an issue or not and adding context so that the analysts when they are looking at, let's say the data etiquette, they have all the context they need. They know that it is important and also they know why with all the context so they know how to fix it. Hopefully in longer term, the fix will also be provided by some of these reasoning models.

But yeah, I mean, it's a huge productivity gain, right?

Stephen: Yeah. Yeah. And to provide just like a very simple example, if an agent has access to Jira and understands that security problems in this account have been resolved recently by this team or this person. Well, if one thing, if a new issue comes in for that account, minimally, it can do things like forward it to that person. And it doesn't have to be perfect. just has to be, it has to be like, the bar is really low right now.

Host: it has to it's all manual, right? At this point, it's all manual.

Stephen: Yeah, yeah. And so we're not trying to achieve perfect. We're trying to achieve useful. Yeah, yeah.

Host: So we spoke about the benefits and all. Since we are running a security podcast, we cannot leave without talking about what are the challenges you see with the whole MCP is the agent AI. Do you see any challenges that security practitioners or security leads should be aware of?

Stephen: I think the biggest change here is that the cost of analyzing incoming security data is going to drop precipitously. So now we can afford to research, analyze, aggregate, filter, prioritize, and route. the goal, one of the goals should be to automate as much of that so that the humans, the people, can focus on making decisions and fixing problems.

And to then build visibility into how well these issue triage and routing workflows are working. One, that's part of the process to get it to the right person, to get a valid issue to the right person.

And then second, now what tools and… measuring the effectiveness and what the practitioners need to go resolve those issues, which should be already being done. These are actually two parts of the process that should already be measured in some way. But we want to understand the latency and the throughput of both of those, both validating and routing issues and then the actual fixing. And what is that doing for our security posture or not? How does that actually relate to risk reduction and hopefully quantified risk?

So one of the other cool things I think these agents might be able to do is because they might be able to help you quantify the risk in a meaningful way because they have access to private information again. And some of that might just be like your firmographics, know, what industry, what's your revenue in general, etc. Not like at this moment, but like last year. And to be able to like quantify this risk. Why not? And this would be very helpful in making the literal business case for prioritizing the fix.

Host: So earlier this week, I recorded an episode with Joseph Haske from Pipedrive. And we were going on, like we were discussing on the topic of qualitative versus quantitative risk assessment. And one of the things that he was highlighting was quantitative risk assessment is still very mathematical, statistics are very difficult. You need to have expertise for it.

And the way you highlighted it, maybe in future, some of these agents can do that risk quantification for us, looking at our infrastructure, our context and things like that. That would help organizations start adopting quantification more and more, use that as a metric to prioritize and hopefully that helps security practitioners, SOC analysts and folks in the security community like reduce the stress that they go through.

So yeah, I'm looking forward to the next couple of months because the way innovation is happening, things are changing every month. So yeah, I'm looking forward to what and how we can make security practices life easy with AI.

Stephen: Yeah. And quantitative risk is getting some traction right now. So I was at a GRC conference recently. And there were multiple tools there that they're quantifying risk and helping people do that in a straightforward way. even if you don't go so far as quantifying the risk having a consistent application of the existing standards around the severity, like applying a severity label based on the context available, that's a win in and of itself. not having to be at the whims of a particular analyst, right? Necessary, right? And like...

Of course you can have an analyst come along and say, well, we need to adjust that. I don't think it's right. OK. But I think there's probably a perception, there may be a perception of improved fairness within the system if you have some form of automation doing it. As long as you've vetted that process and, well, we classified 100 of these and it was right, you know, it was, made a good classification more than 90% of the time. You probably, when you're introducing that into the organization, you're probably want to show some data to show your teams that this is going to be a high signal, a high quality signal.

Host: Yeah. I think one of the things that you mentioned during the recording today is like, we're not going for perfection, right? We're going for usefulness. As you add more and more useful scenarios that helps organizations break complex things to make them simple.

So yeah, I mean, that's a great way to end the podcast today.

But before I let you go, one last question. Do you have any learning recommendation for our audience?

Stephen: Yeah. So we spent a good time talking about AI today, and I think it's an important aspect of the technology landscape. And I've been getting a ton of value out of the latent space podcast for AI engineering. That's really helping me keep up on the latest developments because leaders from like leading products, know, at claude or at, at anthropic at Gemini and so forth, they're coming on. I'm talking about the product managers and team leaders that like built MCP came on and explained MCP and like, why they built it, how they built it, and the design influences. So that's like super useful. Cause I'm trying to like speed run the recent history and just sort of like keep up to date in a way that with obviously a very quickly changing field.

Host: Sounds great. So thank you for the recommendation. So when we publish the episode, we'll add it to the show notes. And again, thank you so much Stephen for coming to the podcast. I had a blast talking about IAM all the way from IAM to AI and MCP swipe coding and things like that. Yeah, thank you so much.

Stephen: Yeah, likewise, thank you for inviting me and I hope the audience finds it interesting. And if you ever want to chat about stuff like this, feel free to hit me up on LinkedIn. I'm Stephen Kensley there.