Test Case Scenario

Join us every other week on "Test Case Scenario" presented by Sauce Labs, where our expert panel dives into the exciting and ever-changing landscape of technology, pop culture, and business. Host Jason Baum, Director of Community at Sauce Labs, will lead the discussion with our esteemed recurring panelists: Marcus Merrell, VP of Technology Strategy; Nikolay Advolodkin, Senior Developer Advocate and Evelyn Coleman, Manager of Implementation Engineering. Get ready to uncover the impact of continuous testing in this thrilling exploration of the tech world!

All Episodes

Test Case Scenario

Rethinking AI’s Role in Leadership, Governance, and Productivity

February 19, 2025 • Sauce Labs

0:00 | 19:46

Send us Fan Mail

AI is reshaping development, but is it meeting expectations?

In this episode of Test Case Scenario, Jason Baum and Marcus Merrell explore the evolving role of AI in software development, drawing insights from recent industry reports. They discuss whether AI tools are living up to their promise of reducing burnout and boosting productivity while examining the complexities of debugging, security risks, and governance gaps.

Join the conversation as they share fresh perspectives on how AI tools are reshaping workflows, the real challenges facing engineering leaders, and the steps developers can take to use AI more effectively and responsibly.

Join us as we discuss:
(00:00) Introduction

(01:32) Exploring the DORA Report and AI in software development

(03:34) The impact of AI on developer productivity and burnout

(07:28) Governance challenges in adopting AI-generated code

(10:23) Leadership strategies for rolling out AI tools effectively

(14:16) The evolution of AI and the hype vs. reality debate

(17:39) Reflecting on AI’s future role in software development

We’d love to hear from you! Share your thoughts in the comments below or at community-hub@saucelabs.com.

SUBSCRIBE and visit us at https://saucelabs.com/community to dig into the power of testing in software development.

▶ Sauce YouTube channel: / saucelabs

💡 LinkedIn: / sauce-labs

🐦 X: / saucelabs

Jason Baum [00:00:00]:

This is Test Case Scenario with me, your host, Jason Baum. This podcast is the definitive hub for knowledge and stories in the software testing and development communities. If you're new to the channel, hit the subscribe button and let's dive straight into the episode.

Jason Baum [00:00:25]:

Hey, everybody. Welcome back to another episode of Test Case Scenario. I'm your host, Jason Baum and with me as usual, co-host Marcus Merrell. Hey, Marcus.

Marcus Merrell [00:00:35]:

Hey, how you doing?

Jason Baum [00:00:36]:

Good, good. How are you doing?

Marcus Merrell [00:00:38]:

My account, I'm like seven or eight weeks in a row with pretty much predictable background. I'm in the same place, man.

Jason Baum [00:00:44]:

Predictable background. We're going to need you to go on the road for this podcast. We're going to have to start sending you somewhere budget for that.

Marcus Merrell [00:00:53]:

I'll be in Detroit this time next week, so maybe, maybe.

Jason Baum [00:00:56]:

Alright, alright.

Marcus Merrell [00:00:57]:

I'll be there.

Jason Baum [00:00:58]:

Oh, there you go. We'll have to do a show from Detroit. Well, we've got an exciting topic today and more of the same from last week's episode. If you've been paying attention. We've been talking about the DORA Report, the DORA accelerate state of DevOps report that came out not so long ago and we spoke to Titus Fortner about that last week. We talked about AI's role in software development and some promises that were made. And were the promises kept? I don't know. You'll have to go back and listen to that episode. Marcus and I saw an article by Mike Vizard. It touched on a similar report that was done by Harness. Harness just put out a report called the State of Software Delivery Report - Developers Excited by Promise of AI to Combat Burnout, but Security and Governance Gaps Persist. That's not the name of the report. That's the name of the article. They released the report State of Software Delivery, Report Beyond CodeGen: The Role of AI in the SDLC.

Jason Baum [00:02:13]:

And the report highlights AI's potential to significantly reduce developer burnout and improve productivity while also addressing challenges organizations face in securely and effectively managing AI generated code. And as we were reviewing this article and this report surveyed 500 engineering leaders and developers of those, they said that there was $8 million in lost productivity per 250 developers annually, with 78% of developers spending at least 30% of their time on manual repetitive tasks instead of delivering on innovation. Key findings of the report 95% of engineering leaders and 98% of developers believe AI tools could greatly reduce burnout. However, 92% of developers report AI increases the blast radius from burnout, bad code reaching production 67% of developers said they spent more time debugging AI generated code and 68% spent more time resolving AI related security vulnerabilities. 59% of developers experience deployment errors at least half the time when using AI tools.

Marcus Merrell [00:03:34]:

As a tester, it is hard not to have a tiny bit of shaman for about that. I mean we absolutely saw this coming and I know it's early days and this is the worst it's all gonna, it's ever gonna be. All those cliches that you throw out there, but like this stuff was supposed to already start paying off by now. They've already fired everyone. So why isn't it working? Right. One thing that I think is amazing, I heard this pointed out at some conference I was at. It's supposed to be a 30% developer productivity.

Marcus Merrell [00:04:01]:

The assumption is that that's 30% applies to the time developers spend writing code, which according to. I know it's an old reference at this point, but people whereby DeMarco and Lister, that's already only 20% of their day is writing code. So you're giving them a 30% productivity boost on 20% of their time, which means you go from 20 to 26%. And now the data coming out in two separate reports say that we're actually whittling away at even that. What we need AI to do is to get us out of some of these meetings we keep having. We're in so many meetings like we AI has not yet helped get me more time at keyboard and that's what I need in order to do my job better and write better features and do more innovation, get me to the keyboard. And AI hasn't helped with that.

Jason Baum [00:04:43]:

So what you're saying is we need a clone. We need an AI clone of ourselves to sit in the meeting in meetings and mundane stuff.

Marcus Merrell [00:04:54]:

Right. Give me a four sentence chatgpt summary of the action items that I got out of that meeting which is probably only ever four sentences worth of material anyway.

Jason Baum [00:05:03]:

Well, I mean do you have an AI note taker, Marcus?

Marcus Merrell [00:05:05]:

I think I don't actually. I think, I don't think we got that approved yet.

Jason Baum [00:05:09]:

That's probably a good use of AI.

Marcus Merrell [00:05:12]:

Spot on. It's really, really good at summarizing things. It's really good at, you know, making things crisp and putting them into bullet points and stuff like that. What is not good at is synthesizing new things. How in the world is that going to increase productivity? I can tell you how it should be used, which is, I think a lot of people that are getting Productivity out of it are doing things like give me this YAML file in this format that will work against this other integration or service that I'm trying to use. It's magnificent at that kind of stuff, but what I think managers want to do is like, I want this to replace the hardest part of the developer's job when they should be replacing the easiest part of the developer's job.

Jason Baum [00:05:50]:

Besides your initial take on. Yeah, we knew this was going to happen. Which, by the way, the more I'm around testers, I feel like, like that you say that about everything. You could have always mitigated. Yeah, I saw this coming. When I throw the number, like 92% of developers report that AI increases the blast radius from bad code reaching production, or 67% say they spend more time debugging AI generated code. Where do you go to with that? Could you imagine, like, if you were testing right now, if you were, if you were in the field, how would you feel about AI generated code? And how would you feel about the process that's sort of in place right now where we have so much of that?

Marcus Merrell [00:06:39]:

Personally, how I would handle this is maybe a little different from how some people would, but I think it's actually decent advice, which is I would climb to the top of this mountain. I would become the experts, expert on how our developers were using AI so that I could get ahead of any problems that might be created from them. That's how I would do it. I wouldn't sit back and wait for, you know, the prediction to come true. I would make sure that I am in the forefront of helping people, make sure that they're able to anticipate and help solve these problems and warn about them and, you know, put the risk factors in front of my management to know that, to make sure that they understand this isn't going to just be a thing. You put coins into an outcomes code.

Jason Baum [00:07:19]:

Now, that's advice for the tester. What about the engineering leader? What about those in leadership positions? What are you going to tell them?

Marcus Merrell [00:07:28]:

I would definitely try to roll it out slowly and have a policy before I start rolling out at all. And this is something that I don't think you can really enforce, but you can do culturally, is I would want to make sure that when a developer of mine uses a snippet of code that's been generated by one of these tools, they put in the comment, here's the prompt I used to get this tool and here are the changes I had to make in order to make sure it worked, right? I think that we should strive, and I'm just going to make up a term if it's made up, is I think we ought to strive in social media, in presentations, in thought leadership, in code and everything for prompt transparency. What version and what was the prompt that you used to create this output? I've started to do that in all of my presentations now. Whenever I have an AI generated image that I use, like, give me a picture of three robots that are punching each other like the three studios, and I'll copy that prompt and paste it into the sneaker notes along with the GPT model I'm actually using to produce it. And I think that using that kind of trail of breadcrumbs would help us understand which models are doing better, which ones are doing worse, where can they be helped? And that's what I would say is for my leadership. I don't want people to just vaguely use the tools.

Marcus Merrell [00:08:35]:

I want to know, where are they falling down? When is it inhibiting their time? And I would not roll it out to everyone at once. I would roll it out to a small group of people of mixed background, mixed experience levels to make sure that we understand what people are getting out of it and what. Where the risks are. I don't want to create a whole bunch of overhead for those folks, but if I've introduced AI to them, I've already introduced a lot of overhead. It turns out, let's use that for productivity. Let's use that in a productive, constructive way.

Jason Baum [00:09:05]:

But what we're seeing from these reports is perhaps we ran before we could walk and the promise seemed great, right? We're going to reduce all this time and it's going to be amazing and you'll never have to do the mundane again. It's like, I don't know, I was always taught, if it sounds too good to be true, it probably is. And this isn't the fault of ChatGPT. This isn't the fault of all of the various AI tools that are out there, not to call out one. This isn't the fault of OpenAI. This is the fault of, of us, right as leadership, saying, hey, we're going to use this tool and we're going to get there faster. Bottom line, right?

Marcus Merrell [00:09:50]:

Like we talked about for the New Year's Resolutions episode, I'm trying to be more deliberately positive in my life and try not to be quite so skeptical about everything, but, like, you gotta ask some questions and you gotta start to think, is this a con? Is this a grift? Is there A financial interest from the person who's trying to sell me this product that, that is beyond just getting me to, to sign up for a contract. Like, I simply just cannot believe that in the year 2025 we are still falling for the hype to this degree. I mean, luckily it'll never happen again. We've learned our lesson this time, right?

Jason Baum [00:10:23]:

Yeah. I think the point I'm trying to make is that it's hype over reality. Even though obviously there are tons of tools out there that are kind of like you said, you know, they're, they're the snake oil and they're just latching onto a trend. For sure. We saw it with Web3. Despite however you may feel about Web3, crypto, you know, there certainly are a lot of scams, right? In addition to perhaps some, some real companies out there in addition to them.

Jason Baum [00:10:54]:

I don't think that's all of the issue here. I think a lot of the issue is just the state of which perhaps the technology is at and where our hopes of where it's at didn't mesh, right?

Marcus Merrell [00:11:11]:

Is it possible that we are just in denial that these problems are easily solvable? Like software problems are fundamentally both the same and different from the way what the software problems were 30 years ago when I first got into the industry. Software problems are. It's almost like they're the same thing, but faster. We're having the same problems that we ever had. Communication siloed teams. We just have them faster because everything is going faster. Instead of having those problems over a nine month release cycle, we now have the same level of problems over a two week release cycle. And management is desperate both to protect their jobs and to find the silver bullet.

Marcus Merrell [00:11:51]:

And there's a good chance that by the time the chickens come back home to roost on a decision that you made now, you're not going to be there anymore to absorb the accountability and the next person gets to be a hero and clean up your mess, but you're off being a hero cleaning someone else's mess. So it doesn't matter. The average tenure of the tech ecosystem is something like two years. So this is just a carousel that we all ride. We want a silver bullet that is not forthcoming.

Jason Baum [00:12:16]:

Are you saying this is a human issue, not an AI issue?

Marcus Merrell [00:12:20]:

Yeah, shockingly I'm proposing that this is a problem that people have that people need to solve.

Jason Baum [00:12:26]:

I'm going to ask ChatGPT how to solve it now. We just went in a full circle.

Marcus Merrell [00:12:31]:

Can't wait to hear what they said, because my bet is, my bet is ChatGPT is going to agree and still offer a solution that doesn't make a whole lot of sense.

Jason Baum [00:12:42]:

Yeah, well, when you look at all the issues of bad code, is that the fault of the AI or is that the fault of the developer who's like not catching it?

Marcus Merrell [00:12:54]:

Well, certainly the second, the developer should always be accountable for the code that they put. Like whether or not they generated the code, they put it into production or they entered it into the ledger. At this point that may change, but at this point that's still true, but I also think that people are assuming that ChatGPT or whatever GPT copilot we're doing, like you. I don't mean to keep throwing OpenAI into this. It's not just that we do a lot of assumption that it knows code, it doesn't know code, it knows probability and the data it was trained upon. And most, if you are like most developers that I know who believe that every other line of code that every other developer ever wrote was garbage, that's the data it was trained on, man, it's going to be buggy.

Jason Baum [00:13:33]:

The other piece of what this survey went into was the governance and compliance alarms that are going off. Oh, wow. Yeah, so these, some of these stats are pretty, pretty nuts. That is probably the, the biggest issue actually. Only 48% of developers you, I use it approved AI tools. So the majority are not. 60% of organizations lack formal process for assessing AI generated code vulnerabilities or errors. 58% of companies do not provide clear guidance on which use cases are low risk for AI adoption.

Jason Baum [00:14:16]:

And 60% of engineering leaders and developers stated they do not currently evaluate the effectiveness of the AI coding tools, making productivity gains unclear.

Marcus Merrell [00:14:26]:

Because a year and a half ago we were all eager to play with these things until, until we realized the kind of limits of their capabilities.

Jason Baum [00:14:32]:

Remember back in like 2022 when this came out, the biggest worry, the biggest concern I remember reading about was teachers saying all the kids are going to cheat on the tests. It's almost like, hey, it's like the calculator, everybody's bringing the calculator to the math test kind of thing. And now, now it's like, no, the biggest threat is only 48% being compliant.

Marcus Merrell [00:14:58]:

Yeah. What was that? What was that? Interesting. Like putting a dollar amount on.

Jason Baum [00:15:01]:

Oh yeah. So the dollar amount that they put on it was 8 million in lost productivity per 250 developers annually.

Marcus Merrell [00:15:11]:

And not only that, but that's not including the fact that they pay for the privilege of having an enterprise license that lets them spend an extra $8 million. Then you add that to my subprime crisis hypothesis and the prices are going to start going up, up and up and up. And so we're going to be paying more and more and more for the privilege of costing ourselves $8 million per. Now, who knows if that's accurate or not? Who knows if that's, that will be accurate. I can hear the AI people in the audience already screaming and pulling their hair out saying this is the worst it's ever going to be. This is the worst it's ever going to be. It's only going to get better from here. First of all, prove it.

Marcus Merrell [00:15:47]:

It took 50 times the amount of electricity and training Data to train GPT4.0 over 3.5 and the improvement was only marginal. Fight me. And so I don't believe it anymore. The hypothesis that this is only going to get better because the generative AI, I called it, and I didn't call it, I regurgitated somebody else calling it that we've peaked already in its capabilities. What's going to improve is the ergonomics around it that groups up what it outputs and makes them better. That's what's going to improve things agentic and all that. All those things. That's what's going to improve it.

Marcus Merrell [00:16:21]:

At this point, let's stop pretending that the generative part itself is going to make a night-and-day difference no matter how much data we throw at it.

Jason Baum [00:16:28]:

Well, let me, let me take that and like throw another spin on it. It's not necessarily that. I don't know if the argument is this is the worst it's going to be. It's only going to get better than from here, if that, that I, I know, I've heard that too. And that's what you see out there. I don't think we have a choice. Once you let the cat out of the bag, so to speak. You know, we're in the future of AI now.

Jason Baum [00:16:51]:

Like every, every year that goes by, every day, that's it's only going to be more various versions of. And it's just going to keep coming. And so how we utilize it, I think is going to change how. So I think we're going to learn, like we have to learn appropriate uses of it. We have to learn how to talk to it. We have to learn how to digest the information that it's giving us in a way that we're not losing productivity, but where it becomes a net positive So I think it will get better, but I think it's a. It's an evolutionary process. We're at the very beginning.

Marcus Merrell [00:17:33]:

We're. We're at the dawn of, of civilization here. We have a long way to go.

Marcus Merrell [00:17:39]:

I would hypothesize that for you, me and Titus, all of us, it has been a net positive for us because we think critically and we don't overhype. We understand what it's good at and we don't use it for things that we know is not good at. And I'm asking that kind of at the executive level, not at the practitioner level. So, I mean, I agree with you. There are improvements and this thing will be kind of amazing. It's just that right now, the hype cycle we're in and the grift that we're being sold is so far disproportionate from what we're seeing in reality. I just want people to shut the hell up for a little while and let us actually try to figure out what it's good for.

Jason Baum [00:18:16]:

I agree with you. So that we'll have more on this topic. I am sure. This is not going to be the last time we talk about it. This is definitely not going to be the last time we talk about AI on this show. I think we started this show talking about AI, so it's just interesting to see now. It's been prevalent for the past few years and all the promises made and all the leadership hedging their bets significantly on it. What the reports are now after, you know, it's been out, so we'll see where it goes and we'll be here to talk about it.

Jason Baum [00:18:50]:

So, Marcus, thanks so much as usual. Looking forward to doing this again next week with you and thank you for listening and we will see you next time for another episode of Test Case Scenario. Thank you for joining us on Test Case Scenario. Share your thoughts in the comments. We'll make sure to respond to each and every single one. Don't forget to subscribe and hit that notification bell to keep in touch. If you missed our last episode, it's popping up on your screen right now. Go click it. Until next time on Test Case Scenario.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.