Kubernetes Security Mastery: Shifting Mindsets for Ephemeral Environments with Dinis Cruz
TLDR;
- Having a strong engineering culture is key to setting the basics right for Ephemeral environments and Security with it. As the environments are dynamic in nature, vs static infrastructure.
 - To get the best value out of GenAI, ingest and enrich with good-quality data. This will determine the accuracy of the output from GenAI platforms.
 - In the GenAI world, Thread Modeling is paramount. This ensures security is integrated, the right way, into each layer of the AI Application stack, including Models, Data, Apps, and others.
 
Transcript
Host: Hi, everyone. This is Purusottam, and thanks for tuning into ScaleToZero podcast. Today's episode is with Dinis Cruz. Dinesh is the CEO and founder of the Cyber Boardroom and MyFeedAI startups, and along with that, two other startups in stealth mode. He brings a unique blend of security and engineering experience, expertise with over 20 years in cybersecurity and software development. Dinis, thank you so much for joining me in the podcast today.
Dinis: Thank you. Great to be here. It's going to be a very interesting conversation.
Host: So before we kick off, do you want to add anything to the journey? I just summarized in two bullet points, but if you want to add anything on top of that.
Dinis: Yeah. So the thing I really liked what you guys always talk about is this sort of scale to zero, which is something that I really adopted on my startup, but also on a lot of the work I've been doing for the last, I would say decade. I've always started to be very allergic to this big infrastructure, big databases. I really equated almost things that have to run all the time with sort of engineering gaps. Right. So I started a bit of a journey of what is the minimum stuff required is to run.
what needs to be run and then I realized that there was a huge amount of advantages from security performance, reliability, etc. But even engineering, right? So I really liked this whole scale to zero. So very glad to expand that.
Host: Awesome. And yeah, today we are going to talk about that in particular, like Kubernetes security, which is more of building and managing ephemeral environments. And also we'll touch on GenAI and its impact on GenAI as well. So let's get started, right?
So you have been in the industry for a little bit and for a while, sorry, for a while. And you have seen from traditional data centers all the way to dynamic cloud environments. And even today with GenAI,
Dinis: Yeah.
Host: It's all ephemeral in a way, right? And from a strategic point of view, for someone who has come from, let's say, data center world, if you're talking to a security leader, it often needs a mindset shift, right? Because you are always thinking about your own servers, right? Server racks and things like that. Now it's all serverless, where you have no idea where the server is. How do you help them change their mindset from moving from static servers to ephemeral Kubernetes workloads?
Dinis: Yeah. I think the first thing is that, you know, and I'm a little bit controversial when I say this, right? But I think that most engineering teams, most security teams have to also be engineering teams, right? You have to have developers in your team. You have to have people that understand, you know, CI pipeline, deployment, et cetera, right?
Because like you said, when you go from an environment that is crazy, in a way, stale, right? Like it doesn't change a lot. It's pretty sort of you know, defined and releases happen at very low cadence, right? You kind of get away with a little bit, right? But when you move into the cloud, and especially the more and more the cloud got well adopted, this need brings a lot of challenges, right?
And like you said, when you look at some of the environments, they are ephemeral, which basically means that the environment grows and strengthens, you know, and if you don't have an engineering approach, it means that there's a lot conversations that your engineering, your security team is going to find hard to be involved in. But also there's a lot of times where you do need to build a bit of tooling, right?
You do need to customize. You don't need to take into account because the technology is not perfect. And, and at the end of day, the other thing I've learned was that most security problems are actually engineering problems or workflow problems or process problems. Right. Security is just a side effect. Right. You know, but I think we have a very important role to play, but I think that when the move from one world to the other, you know, brought a lot of advantages, but also brought a lot of new challenges,
Host: Yeah, and so it's funny that you mentioned you start with good security is good engineering like in our previous episode we recorded with Dakota Riley and he started with the same thing that if you want to have good security in an organization, you have to have good engineering practices as well. So that's like it's like one-to-one mapping, I can do with what Dakota said. So now.
Dinis: The best team that I've always seen is the security team. We're always the best engineering team. Because for them, security is a property, right? Of course, they're going to validate their inputs. Of course they do patch management. Of course, they do identity access management. Of course, they manage their deployments, and there was an interesting moment when that shift occurs where I guess from a security we were like, this could backfire because now vulnerabilities can be shipped much faster, right?
So there was the idea that Like, you know, like before we had a bit of time, now we have no time. But the irony is that the teams that can ship very fast are also teams that can fix very fast. So you actually end up with a situation where those teams have much in a way, lower number of issues, no limit of problems, lower number of, of, of some of their concerns because they can make changes much faster than the teams that can barely touch the environment. the bill takes two days or five hours and they deploy once a week, if you're lucky, right?
Host: Yeah, so you touched on a few challenges. So what are some of the common challenges that you see when leaders are shifting their mindset from traditional data centers or traditional server-based approach to a serverless approach in a way or with Kubernetes?
Dinis: Well, I think one of challenges was that now you move much faster. think like you said, the ephemeral nature of the situation, it is a problem. Because I remember being involved in an incident where the container that caused the problem was long gone.
How do you do that? And I still think that most organizations don't do a very good job with logging, monitoring, and real-time information. Also, capturing a lot of information, which is why we end up creating a SIM if you think about it, right? A SIM only exists because the dev team and the engineering team doesn't have those things, right? And that also goes to the heart of a lot of teams still struggle with getting funding for non-functional requirements, right?
Which I think ADI might change a bit, maybe we'll touch that a little bit later, but I think also historically a lot of teams struggled with getting funding for non-functional requirements.
Yeah, basically all the other stuff to do with outside of what does the product, what does the solution does, et cetera, right? Because they get very top down driven sponsorship, et cetera. And again, monitoring and logging and in the way security tend to almost be like this thing that they have to do a bit.
And we sometimes have to fight for that 10%, 20 % that they can squeeze in in the sprint, right? I always try to do a lot more recently, especially even out of the CISO for a while was also try to make the business case that by adding security, actually are going to help with the development. We're going to help with engineering. We're going to, in a way, make the business case for that better CI pipeline, for that better monitoring, or helping with incidents that then overload the security, the development teams. But it's definitely the pace.
And also, the Kubernetes, and I've worked at CTO for a while, so I know very deeply Kubernetes. Kubernetes could also be quite challenging. And Kubernetes also, if you're not careful, becomes also a gigantic beast. It's kind of like we go from a monolith that was this size, and now we have a monolith that's kind of distributed. But it's still kind of a monolith, because you only have one or two three instances of your Kubernetes instance, which can be a problem.
Host: Yeah, yeah. So there can be multiple threads that we can open from this. The first one I want to start with and the first one that we want to start with is that when you have dedicated servers versus ephemeral servers or ephemeral machines, there are several challenges that you touched on. One of the questions that we got from Digvijay Singh who is a common connection is that how does security fits into an organization?
Like generally when it's a startup, you are sort of fighting between profitability and business growth versus any other initiatives, right? Whether it's security, optimization, non-functional requirements and things like that. So how do you make sure that while focusing on business and profitability first, security or these non-functional requirements are also a priority? How do you find that balance?
Dinis: So this is very interesting because I've actually experienced this now very deeply from both angles as the client, right? As the one consuming a of and materials and now as my own, you know, companies, my own, I'm running, like I said, four companies, right? And everything I do is open source. And one of the things that I did, lot driven also by my experience with seeing other startups is that I think if you're not careful, technical debt, is what will kill you, right? In a lot of companies, right?
So I spent, the first major bit of time that I spent when I was like, you 100 % on developing was on the building environment and the CI pipeline. That's almost the first thing I built, right? And the reason I did that is because I realized that without a very effective way of shipping code, of testing code, of basically having all that pipeline in place, everything else you do becomes technical debt, right? And you can't make easy changes, etc.
And I've seen that with other companies. I've seen all these startups and after a while I realized, you guys are behaving just like the ones you complain about. You brought a bit of innovation, but it's almost like the bigger they get, the slower they become. But what I've experienced is that when you do good engineering and you kind of keep focusing on those non-functional requirements and the testing, etc, is that the more you have, the faster you go. And that's kind of where I'm now. I get to a point where I have crazy speed because I build all these modules.
And those baking like that, that security in, baking those practices in from the beginning are very important. And also I give you a very good example. I think a lot of startups and a lot of companies, they get in a position where they're solving the wrong problem.
And the reason they're solving the wrong problem is because they have customers, which are great, but they are cursed because the customer comes along and says, oh, I want to use your product, but I need this and I need that.
And this company spent a huge amount of time doing stuff that is not core to what they're supposed to be doing, right? I remember working for a company that was doing static analysis and I looked at their road map and going, dude, what the hell is this? Like, who wants this, right? And they're like, we need Oracle integration because of that customer, because of that. And like, your product barely works, man. We should be making the product work, not building all this stuff. And the part of it is, a lot of it is engineering.
So for example, what I did was one of the first challenges that I solved, and it was harder than I thought it would be, was again, on the theme of the podcast, right? Was scaling to zero. So I said, If I'm building a service like the Cyberboard Room, example, which is fundamentally started as a GPT wrapper, right? Simple website, wraps it all, all the hard stuff was doing by an NLM. So fundamentally, it was a simple web app, right? And it gets very complicated very fast if you're not careful, but my premise was I want to be able to deploy this everywhere, right? From my laptop, to a serverless function, to EC2, to Kubernetes, to Fargate, to running offline, to VMware, to AMI, etc. All of it. And to… AWS, GCP, Oracle, know, Azure, whatever IBM cloud, whatever, right?
And that was interesting because that discipline forced me to clean up so many things, right? And in a way, I have a crazy scalable solution and I solve that problem first. Now, what this gives me is for example, a lot of security, for example. So if I when I have a customer that says, oh, I want to make sure that my data is private and I and then there's no, you know, possible, you know, leak of data or I want to you know, consume some confidential data, we basically say, cool, look, we can deploy you in an isolated environment, we can deploy a VPC, you can deploy on your own, on a dedicated AWS account, on your own AWS account, whatever you want.
And then suddenly, there's a lot of things that we don't have to develop, simply because we engineered you well, simply because we said, look, you have Kubernetes, you know, here's the deployment script, bang. Yeah. But that, yeah.
Host: Right, In a way the foundation is set, yeah, the foundation is set for the business to thrive in a way.
Dinis: Yeah, so in a weird way that allow me a lot of interesting business opportunities that before was just not possible, right? So the ability to deploy on any Kubernetes cluster is very important because then they can bring their own cluster. I give them the Helm charts. I give them the installation. I give them the AMIs or whatever environment they need on their images and then it deploys, right? And that's crazy powerful, right? And it's an engineering, in a way it's an engineering problem that I solve and Zulu for Torch by the way, it's all the stuff is pipeline. but allow me a lot of business advantages, but also allow me not to have to develop a lot of features, you know, and not to fall into that trap as you get customers, they ask you for stuff and I go, no, I'm not developing that. You run it on your environment or you do it there or I'm leveraging this, I'm leveraging that, right? So that makes a massive difference.
Host: Yeah, absolutely. So another thread that I wanted to get into is around security and cost, right? Like when it comes to no idle servers, it's a win for cost because you are not spending any or you are not paying for your machines being idle. Only when they are used, you pay for it. But that's a challenge for security, right?
In that case, how do you maintain security and compliance visibility when your workloads are spinning up, spinning down, how do you maintain it from a security standpoint?
Dinis: So yes, I think there's multiple elements there right so I think one element is that The more you get into that world in the way the easier it is to for example create very secure builds to create environments who are secure by default because you are always speeding up new things so it's always easy to experiment right is easy to lock it down It's much harder to do that when you have one server that you can't touch right so that environment allows you to start to really zoom in on what is the exact stuff I need on that environment? So in a way, you can create a very secure environment.
And I know we're going to touch on identity access management a bit. And that, again, becomes a very key part of it. So I'm a big fan of the small environment, this left-moving part. But then you really need to double down on visibility, understanding, architecture visualization, et cetera. Because in a way, you should have all the logs. It's almost like,
If you design this effectively, nobody should touch production, right? There should be zero case, even in any scenario, you should ever need to touch production because you should have all the data, all the logs, all the information available and maybe not ready in a system, but you should have access to it. So then you can go and load it up, right?
So the fact that you need to dial in to a particular environment is already a sign that something is not right, right? And that's a good indicator that your environment is not properly set up, right? So.
If you're really capturing and logging a lot of information, and by the way, these days, Gen.ai and then they actually introduce an element that maybe we can cover quickly, which is one of the craziest things that MCP introduces, which is good. I know why they've it, right? But MCP does this even more crazy because not only you now have an infrastructure that can go up and down, right?
The MCP server itself, i.e. the one that tells the LLM what's available, right? Like the way you communicate with Gen.AI can change its definition dynamically, right? So imagine answering an incident where not only the server that was supposed to be there is not there, the endpoints that the server was providing could have been different, would have been different in the hour or the minute of the incident than later on, right?
So that's why… See, I think Gen.ai is going to, in a weird way, is going to allow the technical teams to make a lot of more business cases and get a lot more investment on non-functional requirements, like logging, for example, because the business now will understand it. Because the business now is going to into vibe coding, into all sorts of stuff, and they're going, yeah, we really need to get a better understanding of what just happened or what's going on.
So I think you're going to get an interesting dynamic where… there's a lot more appetite to build really robust systems and logging and monitoring, like you said, and like, for example, like most tools, security tools suck at this, right? Like they're really bad because I remember once talking like on a family thing to a vendor and I said, okay, what's the licensing model and blah, blah, blah. And then we're like, dude, you have to understand that we spin up a thousand containers, five thousand containers a day, right? So we have 5,000, theoretically VM a day.
How do you handle that? Like, you know, like you can't load up in your UI. Your UI has to and your technology, your graphs have to take into account that we don't only have one element, we have 25. Now, also we have much models and we have 5,000 of them a day. The interesting thing is that you don't have 5,000 different ones. You have 5,000 that should only be 10 or 20 or 30 different ones, right? The rest is just copy that go up and down, you know, depending on your traffic.
But I think logging and monitoring is literally the thing. actually, I think Identity Access Management is the key for all of us. I think the way we will maybe can get into that. think we got away a lot with it in the past. I think we really never had good control of it. We overprivileged everything. We have keys with massive amount of privileges. And it's not the developers in these news for it. It's just the system was rewarded for that. Our infrastructure, even the cloud infrastructure, was never very good.
Allowing that granularity. But now, especially with the GEN.AI agents in the middle and the blast radios, I think we have an opportunity to really say, look, the only way that we can really contain what the hell a particular service can do is via the identity access management that is used on that request. So the tighter that is, the more we can control the blast radius.
Host: Right, right. So we will come to identity and access management in a second. One last question, like you touched on logging, right? And the more ephemeral environment you have, the more noise it creates, like more logs it generates as well, right? So one of the questions that we got from Esteban Hernandez is, when it comes to logging, more data is believed to provide better detection results. However, noise and cost becomes too much, right?
How do you choose what to analyze in real time versus what to analyze asynchronously when it comes to security?
Dinis:
Yeah. So I want to be critical of our industry. I think we fall into a trap where most of the solutions to do with login monitoring make money or are rewarded by ingesting the data. So they all design the systems with the idea that give us all our data, we make some sense out of it.
And that, of course, that never scale because it causes a lot of problems. That's why teams sometimes spend lots of time just to get the steam up and running, or a logging solution up and running, and they barely can keep it up.
So my approach to logging is very different. So what I do with logging, and I'm doing with my startups, I actually call it LETS, which is Load, Extract, Transform, and Save. So basically the logic is that instead of taking the source data and send it to a database, to a logging solution, whatever that is, Which by the way, means you create a pet, right? Like you now have a system that is the only copy of that data is there, right?
So my approach is you take whatever log data is and ideally this is where you want to go as granular as you want and you save it into cloud storage, right? You basically use cloud storage as your database. And then what you do is you create a series of transformations, which is what I mean by you load it up, you extract it, you transform it and you save it. So you go from one cloud storage to another cloud storage to another cloud storage.
And every time you do this, you clean up the data a bit, right? And then in the end, you send it to your platform, right? So the power of this means that you actually then focus on sending the least amount of data to your logging platform. So you make sure that it's only cleaned up. You now have transformations in place.
So you can basically say, I'm going to take those raw logs and extract this information from it. But you always have access to the original data, right? So the logic here is you have a CI pipeline that builds your log stuff.
You always have the original data. You always have the maximum granularity available, but it's put somewhere. again, storage is super cheap these days. And what happens is, and I've seen this done in different formats, is that your normal logging doesn't have a lot of granularity, but you can then say, right, for this day or for this period or for this hour, load me up all the data that you have.
So then you get all the data, but it's only for that specific bit. So you get aggregation data, which is much cheaper to store and manipulate, et cetera. But then when you want to zoom in on a particular thing, you reload the data. The problem is you need to make sure you have the raw data available. So the way you deal with the problem you handle is by not having that problem in the first place, which is you don't send your data into your big infrastructure.
In fact, I even say that your log infrastructure should be ephemeral. I should be able to delete the entire logging infrastructure.
Completely on a Friday, right or a Sunday or whatever, right? And then you should have enough data to rebuild it easily And and this is a good example of we don't apply the same engineering principle to even our own security solutions, right? Because if we did we would have much cleaner solutions, right? So for the power of this model and I've seen variations that like this, know, like I said some people transform this and some some have this big ELP, you know extractor form and you know sort of pipelines, right?
And I think it's always the idea that you don't send, you store everything, but you don't send a small subset to your main logging platforms and then on demand, you load what you need, right? And that actually keeps the cost down because the cost of storage is literally orders of magnitude, you know, lower than whatever cost of the platform that you have on your database that you have running there.
Host: Right, makes sense, makes sense. Yeah, that's a good balance to strike. Now, the next thread that I want to touch on is identity, right? Like we'll slightly touch on that. So when it comes to this new, like generally now earlier in data center world, it was said that network is the perimeter, right? When it comes to your security.
Now in the cloud native world, we say identity is the new perimeter. So when it comes to these ephemeral environments, serverless environments or containers, Kubernetes. How do you manage and audit the identity and access when your workloads have very short life cycles?
Dinis: Well, I think that the 36 is very few people do it right at the moment. Very few people actually end up doing it. And it's not their fault. It's just that the systems that we're in that building today, I still don't think are very fit for purpose because they overprivilege everything.
So if you think about it, if you have an epithelial environment, you should have epithelial identity to think about it. That's how it should be. In a way, what we should be able to do… If I have an environment that spins up and spins down and does stuff like that, I should be able to give that environment the exact privileges that you need to do that for that particular request. In fact, you should go all the way down to the request.
You should go all the way down to for this particular task, this is what you're going to do. So this is the credential that I'm going to give to you. And when you talk to other services, you pass the identity.
The problem with this is that it's actually a tough problem to solve unless you understand how everything connects. The problem is we, and this is why I think Gen.AI is very powerful because it's the first technology that actually will allow us to understand what the hell is going on, right? And I know this when you do a threat model and you say, let's do a threat model and let's start, you know, grab a pen and start drawing it, right?
It basically means that they don't have an up-to-date diagram. Now, assume a world where we have very good understanding of the code, of the structure, of the permissions, of the architecture of the flow and generally allows to do that if we would let it help with the documentation, right?
We can now create very fine to information set. We can create very fine definitions of the identity that is needed, right? To do that. And even, for example, like Code Access Security for the ones who are old enough to remember .NET or the Java Security Manager had a really cool concept, which is a concept that when something is going to do a particular action to an asset, it will sort of do a demand to say, does the color of this has the right privileges? Because if you look at it from a design point of view, what happens in most applications is that you lose your identity the further down the application you go.
So you might even have a user identity, then eventually you have a container identity. And every request further down is made on behalf of that identity. Which basically means by the time you get to the asset, you have something making a request with almost power, right, or access to all the data. And then you depend on logic to do that.
The idea would be that if the database or the asset or whoever is doing something will be able to say, hold on, who's making this request? Is this request coming from a user or is it coming from a user pretending to be an admin or it's coming from a blind spawn or it's coming from an injection, coming from a Gen. AI agent out of control? Where is it coming from? And I think identities are key to do this because the identity in a way allows you to define who you are and also what you're authorized to do and what we expect you to do.
And it would be even interesting to say to have identities that take into account the cycles of events. Right. So I think of I think of these like graphs, right. And semantic graphs and structures. So even the identity should be like, you know, if you let's say you're going to send an email after you edit the calendar, after you book something, I feel this.
You should not, the first task should not have send email capabilities, right? And the sending capabilities should not have privileged that in the calendar. And I should not be able to call this service from here, right? Again, like if you look at most Kubernetes environments, internally there's very little security, right? Again, that's kind of the thing that nobody talks about because it means that this service can talk to that service, talk to that service, talk to that service. Why?
Because they don't have strong identity and they don't have strong way to pass the user identity and to validate you know, who is actually making the request, But I think Gen.AI, not from the point of view of inline, but from the point of view of allowing us to understand and allow us to keep the environment up to date is a key piece of the puzzle, right?
You can now go to a Gen.AI agent and say, here's my Terraform script, Freaking map this into a graph, right? Here's my new changes. See what changes. Now here's my Kubernetes Helm charts, right?
Here's my definition. So you can actually start to map this and have this in a way that before was just impossible, unless you spend, as people have, a lot of money to build a lot of internal tooling, right? But that doesn't scale.
Host: Right. Yeah, makes sense. And I wanted to touch on GenAI as a second topic for today. So one of the things is when it comes to GenAI, you can use it for improving your own security, or you can also use it to defend in a way, right? So when it comes to use of GenAI from a security team standpoint,
What suggestion do you have? What can a security build to mitigate risks around, whether it's data leakage or code hallucination, prompt injection attacks, things like that? How can security teams utilize Gen.AI to build security, maybe internal tooling for working around some of these challenges?
Dinis: Yeah, yeah, yeah, if the engineers, right? That's why I it. So I think there's two very big elements here. I think A, think Gen.ai is going to be the most dangerous thing that security teams and organizations ever seen from the inside. Right? Like just imagine an API. By the way, Gen.ai is an API, right? Because it all has an endpoint. Imagine one of your Kubernetes nodes now with Gen.ai capabilities being used to attack the other nodes.
Can you imagine how crazy that is going to go? So, Gen.ai is going to create a set of attack agents that are something that we've never really seen before, which is very intelligent and capable bits of infrastructure in the middle of your ecosystem. So, we're going to have to contain that thing, because in the past, if a node popped, unless you had roots, and even then, it was very complicated. You needed to do a lot of stuff. Even if you remote-connected execution,
It's not like you can just go to the car and say, hey, dude, can you just do a scan internally and figure it out as you go along? Now you can. So I think that Gen.ai is going to basically break something that we had in the past, which is we never really had a lot of problems with insider threats. We had a bit, but never at a big scale.
But I also think that from a security point of view, the thing I wish to tell my team is that our job is not to protect everything. Our job is not to, you know, to, you know, like the whole thing of the attacker has to hit once and you have to protect everything and all that stuff. I mean, that's bullets, right? The attacker has to do 10 things in a sequence for it to work, right? Our job is to reduce that, to break that chain, right?
But the problem is that we always struggle to understand the infrastructure. We always struggle to view because the biggest advantage that we have against an attacker is that we should know what good looks like, right?
So if you think about it, like, the attacker will make a mistake. And what's the mistake is when the attacker triggers a call that was not supposed to happen, it does an action that was not supposed to happen, is there something malicious, or not malicious, or at least exploratory, right? The idea is that if we know what good looks like, we will catch those.
We will start to see where the attacker is, we know what they're doing, we understand the blast radius much more effectively, right? The problem is that… But to do that, we need to understand the system.
So I actually think that the biggest opportunity that security has now is that we have an opportunity to really understand how things work together. Because the reality is that most businesses don't. Because most businesses are fragmented, multiple teams, multiple staff. The people who wrote that sometimes are not even there, right? And then people move on.
There's all sorts of stuff, right? But security, we are the only team in the entire company that can have access to all the data, all the information, all the documents, everything, right? We just need to ask, right? Especially during an incident, then you think it's a motorway, right?
So I think our opportunity in this case, from a defense point of view, is to really understand what the hell is going on, right? In the application, in the environment, in the ecosystem, going back to the cost, right? Like, why do you have these systems running all the time, right? Why do you have systems that could go up and down?
The reason is because a lot of teams are afraid to make changes, right? They learn that, hey, know, it's better to keep that thing up because if you go too aggressive, then freaking shit might go wrong. And then we get blamed, right? So it's almost a perverse model, right? So they get rewarded for keeping more stuff up and have resilience because they don't have sometimes the right information to say, it's okay to go down to small.
It's okay to have that in system. So I think security, we need that in order to defend. Like we need like, when you have an incident, like I said, like we need the data, we need the stuff, we need to be able to go back in time.
We need all that information. And I think Gen. AI, some of the models allow us to do that because they allow us to understand data, right? They allow us to say, I can now consume five different ways to write a NILM charge because each team did it differently. It doesn't matter because I can go to a Gen. AI model and say, hey, transform me that into this. Like translate that into this primitive, into that primitive. So for me, the opportunity, yeah, so.
Host: So sorry to interrupt but one question that comes to my mind is, GenAI generally is seen as a non-deterministic approach, right? Like it's not deterministic. The example that you gave, you go to the GenAI and say that, write this help chart for me. Each time you say it will.
Dinis: No, no, no, no. Okay, different. I agree, right? But the way I use GEN.ai is I use it to translate. And that's very different. the first thing, and even from a security point of view, the first question that you want to understand when you look at a GEN.ai solution is where is the data? All right? Like if the data comes from the GEN.ai, because remember the GEN.ai is a compressed model, right? It's like a zip, know, very effective zip, right? Because they learn the data, right?
And also, It's a world model that doesn't really understand where the truth begins and ends, and that's fine. That's the thing. But where they're very good at is if you give it data, and you say translate this, the GenAI is very good at that. It doesn't tend to hallucinate.
Especially if you put, for example, three in parallel, and you ask three to do the same thing, and if they agree, you have a high degree of deterministic weight. And you add to that what I do is I tend to most of the time create semantic graphs. So I create ontology and taxonomies on top of the data.
So the point here is that if you have a model, right, GenAI model, and you say, here's the data that you're going to operate on, what I want you to do is to translate from here to there. What I want you to do is to create object and graphs from that data. Now you start to be very deterministic, right? And that's how I operate in my startups, where when I talk about provenance and determinism, that's how I do it, right? So I can connect the dots, right?
Now, the problem is if you ask the LLM something and you don't give it the raw data, then there's a high possibility that it's going to make up stuff, right? Because it doesn't have the raw data. The other thing that's important here from a security point of view is that the idea that you can train the model with a lot of data and then can use it to contain who sees what, it's absolutely crazy. And a lot of people are finding that the wrong way, right?
So if an LLM has both our data, we have to assume that one of us will be able to access somebody else's data because you can't rely on the LLM to do that. So the way I handle that is to say, don't create the problem in the first place.
So basically use normal app tech, use normal infrastructure to make sure that the LLM only gets the data from me, only gets the data from you, only gets the data for that particular user. So this is what I'm saying that you don't get the LLM to write the helm charts, but if you can go to LLM and say, here's an helm chart, right?
Describe me what it is, right? Look even something as simple as Describe for every commit on a noun chart map out if there are security implications What is changing like what is the context of what's changing? Because I guarantee you that even security experts if you look at freaking helm charts There's no way that a lot of times you will pick the nuances of actually what's happening in there They're complex they flows you don't have the stuff, but the LLM is actually very good at that, right?
The other ones are very good at understanding because for them, is just a very, I get to learn on code, right? So I do things where, for example, I say, here's my original code, now here's the analysis. And I say, here's the diff of my changes. Can you update my analysis? And then of course you review it, right? You, know, and then eventually you realize that, he's actually doing a really good job and you pick some stuff.
But that capability from a development team, never had that, right? Like we used to ask to do thread modeling. As a way to social engineer the dev teams to tell us what the hell is going on, right? We have to go, hey, give me your network diagram, right? And then we start there and then we go, dude, do you realize that it's wrong and that is also there and that's also there. And so, you know, now we have that ability.
We can go to a team and say, give me your firewall rules. And they go, but this is 10,000 rules. go, that's fine. I can process them, right? I can now take those rules, map it out and go, hey, do you realize that that and that rule makes the whole thing fricking redundant, right?
Because you can now go through this. So that's what I mean. I don't use JNI to create stuff, I use JNI to transform, to translate and to in a way create these graphs that I can then analyze with a lot more determinism.
Host: Okay, that's smart. That's what I was trying to understand. when if you work with any JNI models, these are non-deterministic. How do you work around that and things like that? One of the things that you touched on is that with GenAI adoption and running them in, let's say Kubernetes environment, if it gets attacked, then imagine the impact that it can have. It can talk to your entire infrastructure, can delete, it can get data and all of that.
So how do you anticipate what attackers might do using Gen.AI to target your Kubernetes environments and how do you defend sort of those things?
Dinis: Well, I feel like, you know, again, I'm old enough to remember the early days of AppSec, right? The early days of secret injection, the early days of cross-site scripting, of server-side request forgery, Where it was kind of the same thing. People did it. Then we came along from a security going, hey, guys, it's a bad idea. And they're like, yeah.
And then we came along with crazy exploits. And they're like, yeah, it's a bad idea. Maybe let's do something about it, right? And then you come up with solutions, right?
But in a weird way, we spend like 20 years in security trying to separate code from data, right? Like, let's make sure that we really understand what is code, what is data. And then of course the LL lamps come along and freaking everything is code. And everything, you know, so I don't think some of these things have a solution.
And I think a lot of people who propose solutions, think they, and that's why a lot of GenAI projects are failing because like the thing that sometimes is hard to visualize is whenever you have an LL lamp, especially when it does multiple steps, right?
You especially the agents which is a massive shit show right and there's some cool use cases right and some of the stuff that people are doing with swarms is very interesting right but in a lot of cases you have to understand that you create as your workflow goes your LLM is creating these massive prompts like massive that has not just the latest one but everything else in middle now everything that you put in there has the potential to do secret prompting injection right and this could even be in this example now you could it could be a DNS entry: It could be a text, could be a title, it could be a tool description, it could be a result from another tool, it could be a web page that has stuff in structure.
And I don't even think that we have discovered how to do buffer overflows in the NLMs, which we will, right? At the moment, it's still like very basic examples. It's like, hey, we're still in the Yoda moment, right? Hey, this is not what you're thinking, right? Hey, by the way, don't look at this, kind of mind tricks, right? And they work, right? That's the crazy thing, right?
They still work, but I still feel that we're going to be discovering all sorts of crazy stuff. My point is that we need to start looking at every one of those modules, your pods, have radioactive stuff. We should look at those and go, that should be a red freaking dot. And we should ask the question, what happens if that thing starts attacking all the other nodes? And the reality is that in most environments today, they won't be able to survive.
They don't have enough isolation. They don't have enough stuff. Look, I've done enough Kubernetes development that even when we were not doing malicious stuff, we would blow up a Kubernetes cluster. Literally. The thing for me about Kubernetes cluster, what I realized was how fragile they actually are. You've seen enough of those.
In fact, you know my solution for scaling Kubernetes was to run one node for one server. So I, you know, the scale to zero, started, started, the more I looked into like scaling Kubernetes, the more I realized that there's a moment once you go beyond 10, 20, 30, 50, a hundred, like 500 nodes, right? Where everything breaks. Like they just take turns to break. Like everything starts to break.
And then I went the other way around. said, instead of having nodes, you know, Kubernetes with nodes, why, what's the smallest that I can go? So I ended up going to 200 one-node instances. And it worked perfectly. And you load balance them, and you auto-scale them. And it forced some engineering decisions, but you have such a much better solution. I scale, in that case, I scale to one. So I scale Kubernetes to one.
And then, of course, the control plane is very light. But I think this is what people need to do. think the way I would protect that, I would say, I'm not going to put those agents in a freaking massive Kubernetes cluster with everything on it.
I will run them on a dedicated Kubernetes cluster that runs one node only, right? That their job is to quickly contain that thing once you go south, right? So certainly imagine, imagine you find yourself in the middle of a Kubernetes cluster with a hundred nodes, with massive definitions, with tens or thousands of services that you can hit, right? Versus you go there and there's nothing there, it's just you, right? It's just you and a couple of the things. It's much nicer, right?
Host: Yeah, so it's like as air gapped, not exactly air gapped, but like air gapped environment where it has limited scope in a way, right? And what I'm hearing is the more threat modeling you do at a pod level, you have more understanding and you control what the pod can do or can't do in a way.
Dinis: Yeah, exactly. Yeah, correct. And the point of that like it's even from a normal performance point of view, like, know, like I'm sure you've seen examples of one part out of control causes amount of damage, right? So and this is the irony the irony is that if we do what we'll be talking about here in this, know so far Most of the time 99 % of time you're gonna pick up on engineering problems.
You're gonna pick up on bugs You're gonna pick up on the marketing team that is doing a campaign at 8 o'clock at night didn't tell anybody and suddenly your traffic spikes like no tomorrow because everybody saw the thing on the TV. That happened to me, right? But also like other things like, you know, some deployment that went out or something that is not very good or a new API that suddenly consumes another 20%, but that 20 % starts to bring down.
Or I was working on this project that builds a new modern infrastructure, but it was so fast that would literally destroy everything else around it because they could scale but nobody else could, right?
So suddenly you have this ridiculously powerful pod, especially other days of nodes, right? And every time they hit like a .NET or a Java or anything other pod, they will destroy them literally because these guys can send, you know, 5,000 requests in parallel, the other ones couldn't handle it, right? So suddenly you need, again, like you need to constrain. And that goes back to the whole scale to zero. I actually think you guys could add that it's not just a scale to zero and constraints, because constraints are so important.
And in fact, on that topic, one of the things I'm adding to my pipeline and my infrastructure is I'm actually going to put billing and constraints and budgets at the heart of every single service. So I'm basically going to a model where every service that I create, every capability, every API, everything gets given a budget, right?
And that budget and those keys allow me again to control any problems that might occur, but also from an engineering point of view, it forces you to solve lots of interesting things. So in a way, the constraint actually forces good engineering solutions and actually gives you really good security, right? Because the billing is one of the best ways to know what the hell happened in an environment.
Host: Mm-hmm. Right, right. Yeah. Whether you are actually using a FML or you have long running things and affecting your budget and things like that. Yeah, makes sense. Makes sense. So one aspect when it comes to JNI that is talked about a lot is that will GenAI get rid of all the jobs?
And at the same time, we hear that there is human in the loop, human in the like in the feedback loop and things like that. As we increase adoption of GenAI, particularly automating security tasks, the SOC analyst work and things like that. How will the role of security engineers evolve?
Dinis: Yeah, so first of all, the only people who say that JNI will replace all the jobs are the people who sell JNI solutions and the companies who always like to fire people as soon as they can, right? I actually think that we're gonna need more developers. We're gonna need more unions, right, than ever before.
But the role will change a little bit, right? But, you know, what's gonna happen is that, and this is where I think in worldly maps and the evolution, right? So we are commoditized in the ability to create environments, right? So what's gonna happen is every business function in a business is going to become what will look like a full-blown DevOps development 10 years ago.
So you're going to have HR shipping code, you're going to have finance shipping code, you're going have procurement, you're going to have stores, you're going to have marketing, you're going to have project management. Every function in a business will start to ship code. Why?
Because they're already doing it, but it's called Excel. Everybody has a freaking Excel document. Because that's what they had. And by the way, the business runs on Excel, most of the time.
What's going to happen is that team, they're to discover, and they're discovering co-vibed coding. I like to call it no code development because it's like you code, but you don't look at the code. But they're going to need a developer next to them. And what is the developer doing? The developer is doing non-function requirements.
Because what those guys are finding, and women at Web is doing that, they're finding that, it breaks, and I need to deploy it, I need to be consistent, and I need logging, and I need security. then literally, they now can do all that stuff except the non-function requirements and then its scalability. That's the reason why we came up with Kubernetes in the first place, right?
So I think that actually what's going to happen is that the role will change a little bit, right? I think ironically, I think it changed more from data center to cloud than from cloud to Gen. AI because I think the move from data center to cloud was, and that's why they really struggled that generation was they were used to things that you build, you shift it, you put it there, and you last.
The idea of an ephemeral environment never really came into play. When the teams that move to the cloud, they suddenly understand elasticity of computing. And that's the pattern. This is the start on errors. But every team now is going to need this.
So in a way, every team is going to need their own dedicated cloud environment with their own dedicated stuff, with their own security, with their own stuff. So they don't cause a lot of damage. So I actually think that we're going to need more engineers, not less engineers. But it's like everything, right?
Like if the engineer is not curious, if it's just telling, if he's happy to just, you know, check a box, then sorry, right? That's like the that were in typewriters and the people that were riding horses, right? know, technology changes, right? Like every time the technology changed, there's a number of professions that get a little bit, you know, stuck.
But I think that again, we can use Gen. AI to learn much more effectively. So I'm positive, right? And I also think that there's a lot of interesting moves for decentralization and local model sustainability and smaller things and code. So it doesn't need to be centralized top down.
So I think we can be looking at a much more interesting future, right? That allows a lot more, you know, really good business value. And in that case, right, we need good engineers, right? Like good engineering is never going to go away, right? We need good engineers. Look, there's a reason when I said that we were hiring a new engineering team. In my CISO roles, I literally, we would hire developers and make them security engineers.
Because we realized that if you bring a developer, the developer already understands how to ship code. They know GitHub, they know development, they know Python, they know whatever language they're coding, right? They understand processes, right?
Security is just a variation of that, right? Like the security skills was easier to teach, right? Throw them on a couple of incidents, they learn very fast, right? But… The development mindset is something that again, you know, is a bit different, right? And sometimes I would speak to like these, you business analysts and people from the business and I say, you are a developer. They're like, no, I'm not. You are because what you do in that tool is exactly what we do in development, except that you think that development is about writing code, but it's not.
Development is about figuring out the problem, solving it, creating artifacts, where I work with abstractions, right? That's what development is. So I think we're going to need a lot more developers. In fact, we're going to need a lot more engineers who know how to build resilient systems and who know how to basically build a non-function requirement while the business is going to go crazy building apps.
Literally, it's going to go... Look, one of my startups, I have some of my business partners, they go to town. I basically create an environment where they provide coding every day and then they just accept everything. And then I basically come behind.
And I connect the stuff and I pick the right things. And even when I do a bit of coding, I have two modes, which is the mode where you don't look at the code, which is what they do. And I do that, but I air gap it and I bring it into the main official code. So I don't have any Gen. AI in my development, but I use Gen.
AI every day. Right. And it's ridiculously powerful. My workflows are so efficient. I give it like 20,000 lines of code and say, I want to work here, here, here. And yeah, the quality is just off the charts. Good. Right.
But you can't let it loose, right? That's the thing. So again, you need to be a good engineer. Exactly, and you need to learn. So again, if you're curious and you learn and you are interested and you, you and now it's almost like, where do you want to be? You want to be close to the customer or do you want to be close to the metal, right? I think that you need, we need them everywhere.
Host: Yeah, yeah, you need to the guardrails. Yeah, yeah, makes sense. And I liked how you connected it back to how we started, right? That good security comes from good engineering, good engineering practices and things like that. And I'm happy that we have, like, I agree with you that there would be need of more engineers in future because somebody has to build those apps like the agent TKI apps or LLM apps and things like that. Somebody has to build it. So as long as you're engineering.
Fundamentals are set properly, then you will thrive even in the new AI world. So yeah, makes sense. And that's a great way to end the podcast as well.
But before we end, I have one last question for you, which is, you have any learning recommendation for our audience? It could be a blog or a book or a podcast or anything.
Dinis: I think the stuff that the agentic foundation is doing there's a lot of good stuff on regular calls and very active on LinkedIn there's a couple whatsapp channels they get into I think they're doing really cool stuff with swarms and it's a little bit of a different approach but again is where we going to and I think they want a few ones that I've seen that are doing what I would call agentic stuff.
Everything else is just workflow. Most people doing agent stuff is a shit show. But these guys are, because they're using the swarms, right? Like they literally are basically saying, have lots of like 100, even sometimes 200 agents working in parallel. And then they create stuff like, and he has quite a lot of high quality, but again, it's important to understand that's very good for experimentation, is very good for some particular topics.
And they even did some crazy stuff of like creating customized models for a specific role, right? That's interesting, right? So again, if you want to kind of explore that and the community is very user-friendly, ask questions, it's a cool, and then most of the stuff is open source again.
By the way, most of my stuff is open source too, right? So again, if you guys want some really cool foundation stuff, check the things that I'm going to. And great questions, man. I always love to talk about scaling to zero and things f-crumble and connecting that to good engineering and good security.
Host: Yeah, makes sense. And thank you for sharing the recommendation. When we publish the episode, we'll add them to the show notes so that our audience can go in and learn from it. And yeah, it was a fun conversation. Thank you so much for coming to the podcast. There was a lot of learning for around the scale to zero, the ephemeral environments, Gen. AI, things like that. Yeah, thank you so much for coming to the podcast. Yeah, absolutely. And to our audience, thank you so much for watching. See you in the next episode.
Dinis: Perfect. Cool. Thank you. Thanks for having me.