Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming

In this episode I unpack Kazemitabaar et al.’s (2023) publication titled “Studying the effect of AI code generators on supporting novice learners in introductory programming,” which found that students who had access to AI code generators while learning how to code out performed students who did not have access, even when engaging in manual coding exercises.

Quote our results show that Learners who
had access to the AI code generator the
Codex group were able to successfully
generate code and showed evidence of
understanding the generated code during
the training phase they performed
significantly better on-code authoring
tasks 1.15 X increased progress 0.59x
less errors 1.8 x higher correctness and
performance on The Following manual code
modification tasks in which both groups
performed similarly furthermore during
the evaluation phase on the immediate
post test Learners from the Codex groups
were able to perform similar to the
Baseline group despite not having access
to the AI code generator in the
retention test which was conducted one
week later Learners from the Codex group
performed slightly better on coding
tasks and multiple choice questions
although these results did not reach
statistical significance finally our
analysis indicates that Learners with
more prior programming competency May
benefit from AI code generators end
quote and that's from page 2 of the
publication titled studying the effect
of AI code generators on supporting
novice Learners in introductory
programming this paper is written by
Majid kazima tabar Justin Cho Carl Kato
ma Barbara J Erickson David weintrop and
Toby Grossman apologies if I
mispronounced any names here's the
abstract for this paper quote AI code
generators like open AI codecs have the
potential to assist novice programmers
by generating code from natural language
descriptions however over-reliance might
negatively impact learning and retention
to explore the implications that AI code
generators have on introductory
programming we conducted a controlled
experiment with 69 novices ages 10
through 17. Learners worked on 45 python
code authoring tasks for which half of
the Learners had access to codex each
followed by a code modification task our
results show that using codex
significantly increase co-authoring
performance 1.15 X increased completion
rate and 1.8 x higher scores while not
decreasing performance on manual code
modification tasks additionally Learners
with access to codex during the training
phase performs slightly better on the
evaluation post-test conducted one week
earlier although this difference did not
reach statistical significance of
Interest Learners with higher scratch
pre-test scores performed significantly
better on retention post-tests if they
had prior access to codex end quote
about a summarizes study into a single
sentence I'd say that this study found
that students who had access to AI code
generators while learning how to code
outperform students who did not have
access even when engaging in manual
coding exercises and that right there is
a fascinating little finding that we're
going to unpack in today's episode of
the csk8 podcast now if you don't know
who I am my name is Jared O'Leary I've
worked with evergrade kindergarten
through doctoral students in a variety
of contexts from music education to
computer science education classes you
can find my CB on my website
jaredolary.com while you're there you
also find over 180 other podcast
episodes as well as some interviews with
some awesome guests and solo episodes
like this where I unpacks scholarship in
relation to Computer Science Education
now this week's episode kind of builds
off of last week's episode which was
talking about plagiarism and chat GPT so
this study's actually going to look at
well what happens if we use code
generation when learning how to code now
there's an idea called distributed
cognition where as an example you can
take a tool and it will allow you to
focus on higher level processes rather
than the mundane things like a
calculator instead of focusing on like
multiplying and carrying the one Etc
blah blah blah you can instead focus on
like using the calculator to give you
the result that you're looking for to
then apply that result into like your
construction project or whatever so
thinking about the bigger picture rather
than focusing on the mundane task of
actually doing the mathematics
generative AI can do the same thing so
instead of focusing on writing out your
for Loop you can just instead write out
a simple prompt that fills in your for
Loop that I don't know maybe is 20 lines
of code long for an entire function and
you can just copy and paste that into
your application and write it out in I
don't know let's say half the amount
time that is a potential benefit when
you are using generative AI to help you
write code but there can be some
drawbacks with that so what happens if
you don't have access to that tool like
let's say the server is shut down for
chat GPT for a day but you have a
deadline that you have to meet when
authoring that code and you can't do
that because you might not know how to
do it you are 100 reliant on AI writing
it for you or what if the AI is wrong
and it creates an error and you can't
fix that error so while on one hand it
provides some affordances using it also
creates some constraints and so we need
to kind of consider this and explore it
so this study's actually going to do
that by exploring five different
research questions this is from page two
of the PDF quote rq1 are novices able to
utilize AI code generators to solve
introductory programming tasks rq2 how
do learners's task performance measures
EG correctness score completion time and
error rate differ with and without AI
code generators rq3 how does learners's
ability to manually modify code differ
with and without the use of AI code
generators our Q4 what are the effects
on learning performance and retention
from using an AI code generator versus
not rq5 what is the relationship between
existing programming competency and the
effectiveness of using AI code
generators in introductory programming
end quote alright so the next section is
on the related word so they have
subsections in here on natural language
programming on AI coding assistance
introductory programming
Etc so if you're interested in learning
more about some scholarship that
explores those areas then make sure you
check out that section the following
section talked about the AI assisted
learning environment so if you want to
learn more about how to create your own
version of this and like what they did
when considering it like on the
implementation data instrumentation
programming task design quality of AI
generated code Etc check out that
particular section let's talk about okay
well what was this particular study so
this is under the user study section so
there are three phases to the study the
first phase was like a two hour long
introduction to how to code specifically
with scratch the second phase was a
training phase so this is where the
Learners were in two different groups
one group was just learning how to code
without AI code generators and the other
was actually using that to help them
code and to go through 45 programming
tasks and complete 40 different multiple
choice questions and then in the third
phase which is the evaluation phase this
was kind of like a post-test so this
phase nobody was allowed to use the AI
code generators or the python
documentation or receive any feedback on
the coding task assignments and they
basically had to do a post test to see
whether or not they still remembered the
information or the things that they
learned during the training now in the
study there were 69 Learners 21 of them
were female 48 were male and they were
all in the age range of 10 to 17 and
they had a different range of background
demographics Etc which you can check out
on PDF page six on the bottom right
corner or if you're interested in that
you can also check out the data
collection discussion and data analysis
but I'm gonna skip that because I don't
think that'd be interesting for this
particular podcast so let's get nerdy
and get into the results starting on
page seven in the training phase overall
the complete Nation showed that the
people who were able to use the Codex
which is the AI code generator were able
to finish more of the tasks than the
group that used the Baseline so the
Codex group had a mean of 90.9
completion whereas the Baseline group
that did not have the code generator
only completed 79 their correctness
score was also much higher so the group
that had the AI was able to get 80.1
percent for the mean as compared to 44.4
percent for the Baseline group and they
spent less time doing it so the mean
seconds for this was 210 seconds for the
AI group where I was it was 361 seconds
for the Baseline group so that is a huge
difference in all these different
categories and the Codex group actually
used less documentation than the other
group so they used it 22.1 percent of
the time as compared to the Baseline
Group which used it 54.3 percent of the
time they had less errors overall most
of the errors were syntax errors and in
general these errors were less than the
Baseline group but they had the same
amount of like semantic errors across
both groups or a relatively same amount
so .01 and 0.03 now at the top of page
statistics of tasks in which the AI code
generator was used broken down by topic
so the topics were Basics data types
conditionals loops and arrays so it
talks about what percentage of time did
students actually use the AI for the
different ones so for like the basic
stuff they used it 48 of the time for
data types they use it 61 of the time
for conditional 75 for Loops 84 and for
arrays 85 percent so as it got into more
complex topics then it seems like the
students were using the AI code
generator more than they were at the
beginning this figure also breaks down
the number of multiple usages for each
one of those whereas like compared to a
single use as well as the number of
tasks that were 100 completed by the AI
generators and then the percentage of
time that they actually just copied the
question directly from the code
generator into the answer and you'll
notice as it got more complex they did
this more often but a question that I
have is was were they running out of
time or was it because they just found
it was easier than actually typing it
out Etc was it a causation in that as it
got more complex that these students
relied upon the AI more or is it more of
just correlation in that it just
happened to be the complexity was kind
of unrelated to it and really it was
just like the time in which that
something was introduced so like as they
got later on into this particular study
they're running out of time they're like
ah let's just get through the arrays and
the loops let's just copy and paste this
it'll save me some time and I'm not sure
about that but the authors do note that
even though these like percentages are
interesting the patterns were not
consistent so like some students use the
AI code generator a lot While others who
had options to it barely used it at all
so there's like this large Continuum
this large spread between usage it
wasn't consistent among the different
students or participants which is a
great point that the authors made now
one of the things that I personally
would be afraid of with students
learning how to code using generative AI
is well maybe they're not actually going
to learn the concepts they're just going
to rely on this tool doing it for them
which is what I've often heard some
people say with mathematics Concepts
well you need to write it out by hand
otherwise you're going to rely on your
calculator and you're not going to
understand it well in this particular
study they found quote although Learners
in the Codex group use the documentation
less and relied heavily on AI generated
code for the authoring tasks they still
did just as well and in some cases
better in manual code modification tasks
end quote from page 10. that is such a
key finding right there we really need
more studies to kind of like follow up
with this and figure out is this
something that is found in other areas
with other platforms Etc because if so
that is a fascinating result honestly
one that I wouldn't have predicted what
it's basically saying is hey we could
use these code generators that allow
students to learn something faster and
they're actually scoring better than if
they hadn't used it right now I'm
looking at these findings and going
what's the downside here well would
anyone not do this especially when we
look at the evaluation phase which is
like the post-test and you find that
both groups perform similarly on all
three different tasks that were measured
in terms of the authoring tasks the
modifying task and the multiple choice
tasks and in some of those tests the
students who use codex the AI code
generator actually perform better than
the students who did not that is really
interesting now before we sing Our
praises to AI code generators there were
more errors from the students who use
codex when they were doing the manual
coding so they had a mean of 1.58 errors
compared to a mean of 0.99 when there
was no starter code provided so they're
making slightly more errors but not
enough to make me go yeah we shouldn't
do this this one did have some
statistical significance however they
also had slightly more errors when
modifying tasks but that did not have
statistical significance so again more
studies would like help to figure out
why this is happening and whether or not
this is generalizable outside of this
population now the authors also had some
qualitative feedback so they asked
students what their perspectives around
this which is great so just looking at
only how well students performed is
something that a lot of researchers end
up doing that kind of like leaves out a
larger piece of the puzzle when it comes
to educational psychology like yeah a
student may have performed better on
this but did they walk away from the
experience going wow I really hate that
subject area because if so that's
something we need to know and figure out
why so it's great that they were not
only looking at what was learned in the
retention of that learning but what
students actually thought while doing
this more researchers should do this
kind of research where it's looking at
this broader like whole more holistic
approach to Learners and learning so
kudos to the authors for this here's a
quote from page 11 that's interesting
quote both groups felt they learned
about Python Programming and its
Concepts during the training phase
however on stress and discouragement
Learners in the Codex group felt
slightly less stress you 390.5 and
p-value of 0.056 some Learners from the
Codex explicitly attributed their
reduced stress to using the AI code
generator for example participant 26
reported quote using the code generator
helped me save time and reduce pressure
end quote and they've got more
quotations from the different
participants and whatnot that kind of
reinforce this so in general not only
did the students perform better but they
actually liked it better some like small
percentage of students said hey I'd
actually prefer to do this on my own
rather than use a tool that does it for
me I might be that kind of person like I
kind of like the struggle because I feel
like I'll learn more through the process
but at least with these participants
that might be more of an outlier than
the norm alright so the discussion
section is like really neatly organized
with the different research questions so
the first research question is around
whether or not can novices use AI code
generators the answer is yes so the
second research question which is how do
Learners as task performance differ with
and without AI code generators well they
perform better with the code generators
the third question is how does
learners's ability to manually modify
code differ with and without code AI
generators and the student students who
use the code generators were able to
perform just as good if not better than
the students who did not use that when
they were doing the post-test that did
not allow anyone to use the code
generators that's a interesting finding
the fourth research question which was
what are the effects on learning
performance and retention from using AI
code generators versus not so they found
that this doesn't like impede the
learning results and in fact it might
lead to Better Learning results than not
using it in the final research question
rq5 is what is the relationship between
existing programming competency and the
effectiveness of using AI code
generators in introductory programming
and what they found is that if you came
in with a higher pre-test score you are
likely going to benefit even more using
the code generators than if you had a
lower score so if you have prior
experience with learning how to code and
then you are going in and adding in an
AI code generator on top of this it
might make it so you can Excel and learn
faster at a better rate and get through
this faster than if you did not have
that prior experience and we're using
the AI code generator so this may be an
accelerant for those kinds of students
now the authors mentioned the use modify
create framework which if you're
unfamiliar with that framework check out
episode 26 that was well over 150
episodes ago which is titled
computational thinking for youth in
practice so if we think of like use
modify create with modding video games
so making a video game do something
different so the use would be just
playing the video game the modify would
be actually modding the game where you
are changing the code that exists and
creating would be like hey I have this
blank IDE page I'm going to create a
brand new game from scratch so that
Continuum of like playing versus
modifying changing a little bit versus
creating something from nothing and in
between each one of those so the authors
talk about how AI code generators can be
a form of this it could be like a crutch
for students to be able to use a AI code
generator and eventually be able to
modify the code that is created by it
and then eventually be able to get to a
point where they can create it without
it but a question that I have is if it
ends up saving you time and you get the
same out if not better understanding why
would you even get to the modify and
create when you could just go with the
use and learn how to create code with AI
in collaboration with the AI rather than
modifying what's given because code
might not work very good or creating it
from scratch from on your own like I
have a drum kit behind me I could learn
how to whittle my own drumsticks or I
could you know just buy some from the
store and then focus on making music
with it it's the same thing with
programming instead of focusing on like
whittling away with by writing out lines
of code maybe instead I want to just be
able to very quickly create this thing
and then use that program to do
something like to play a game or to do
whatever so here's an interesting quote
from page 13. quote the benefit of AI
code generators for novice Learners
could be explained by the effective
employment of the use modify create
pedagogical strategy often used in
introductory programming although
Learners in the Baseline group use the
documentation more frequently they had
to start each task by creating a new
program from scratch before getting to
the modify portion of the activity and
thus encountered more errors however
Learners from the Codex group had the
advantage of using Code that was
generated specifically for an intended
behavior that they wrote this meant that
the AI coding assistant turned the crate
task into a used modify task as they
were provided with functional pieces of
code in response to the written
description which they could then modify
therefore they were able to spend more
time to trace test and learn about the
structure of the generated code before
moving on to the modifying it in the
next task end quote that's a great point
so if you're able to spend more time
creating and thinking rather than
actually writing out lines of code is
that a win especially if it doesn't lead
to learning loss but instead potential
learning gain over not doing that
approach so on page 14 the authors talk
about okay well what are some of the
potential implications for design so
they talk about supporting complete
beginners so AI assistance could be used
to help them out they talked about
control for over utilization of these
tools so if you are over Reliant upon an
AI code generator they provide some
suggestions in there and they also talk
about how you can use this for creating
some writing prompts with the classes
that you work with so if students are
unfamiliar with how to prompt an AI to
create something for them you could
generate some prompts for students to
work with or create a list of prompts
that students can kind of like use to
start their line of thinking with the
things they want to create in
collaboration with generative AI if you
want to learn more about those ideas
because I just kind of tease them make
sure you check out page 14. now at the
end of these unpacking scholarship
episodes I'd like to share some
lingering questions and thoughts so one
of them is how do the findings for the
study compare with studies on students
using stack Overflow so as I mentioned
last week stack Overflow is like a
repository of questions that are related
to programming or people will say hey I
can't figure out how to get my program
to do blah blah blah blah and people
respond to it with code or with a
description of how to solve that problem
professional programmers often say that
they frequently go to places like stack
Overflow and they will often copy and
paste and then modify slightly to make
it so that the code Works in their
particular program that's something that
I've done on different projects that
I've worked on Etc but I imagine that
there are some studies out there on
whether or not people learn something
when they're using stack Overflow so
it'd be really interesting to kind of
compare those studies specifically on
copying and pasting from stack overflow
with studies like this one that are
actually looking at generative Ai and
that kind of collaboration that could go
on in a classroom but last week's
episode kind of shared some more
questions and lingering thoughts that I
had about using generative Ai and
plagiarism when it came to coding so if
you want to hear more about that make
sure you check out last week's episode
which was episode 187 and again it was
titled will chatgpt get you caught
rethinking of plagiarism detection if
you enjoyed this particular discussion
on generative AI there are many more
podcast episodes there's currently over
April fools ones if you're interested in
those but if you check out these show
notes at jaredoleery.com there are
multiple episodes that are specifically
related to AI like episode 13 AI for all
curriculum development and gender
discourse with Sarah Judd episode 142
teaching AI in elementary school with
Charlotte Duncan episode 173 empathetic
listening in computer science with Josh
Sheldon in episode 176 the end of
programming and last week's episode that
I already mentioned if you enjoyed this
episode consider sharing it with
somebody else or leaving a review or
just simply pressing the like or putting
a comment in the YouTube video that you
may be listening to this on stay tuned
next week for another episode until then
I hope you're all staying safe and are
having a wonderful week

Article

Kazemitabaar, M., Chow, J., Ka To Ma, C., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). Studying the effect of AI code generators on supporting novice learners in introductory programming. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems - CHI '23, 1-23.

Abstract

“AI code generators like OpenAI Codex have the potential to assist novice programmers by generating code from natural language descriptions, however, over-reliance might negatively impact learning and retention. To explore the implications that AI code generators have on introductory programming, we conducted a controlled experiment with 69 novices (ages 10-17). Learners worked on 45 Python code-authoring tasks, for which half of the learners had access to Codex, each followed by a code-modification task. Our results show that using Codex significantly increased codeauthoring performance (1.15x increased completion rate and 1.8x higher scores) while not decreasing performance on manual codemodification tasks. Additionally, learners with access to Codex during the training phase performed slightly better on the evaluation post-tests conducted one week later, although this difference did not reach statistical significance. Of interest, learners with higher Scratch pre-test scores performed significantly better on retention post-tests, if they had prior access to Codex.”

Author Keywords

Large Language Models, Generative Models, AI Coding Assistants, AI-Assisted Pair-Programming, OpenAI Codex, Introductory Programming, K-12 Computer Science Education, GPT-3

My One Sentence Summary

This study found that students who had access to AI code generators while learning how to code out performed students who did not have access, even when engaging in manual coding exercises.

Some Of My Lingering Questions/Thoughts

How do the findings for this study compare with studies on students using Stack Overflow?

Resources/Links Relevant to This Episode

Other podcast episodes that were mentioned or are relevant to this episode
- AI4ALL, Curriculum Development, and Gender Discourse with Sarah Judd
  - In this interview with Sarah Judd, we discuss what Sarah learned both in the classroom and as a CS curriculum writer, the curriculum Sarah continues to develop for AI4ALL, advice and philosophies that can guide facilitating a class and designing curriculum, some of our concerns with discourse on gender in CS, my recommended approach to sustainable professional development, and much more.
- Empathetic Listening in Computer Science with Josh Sheldon
  - In this interview with Josh Sheldon, we discuss computational action, designing exploratory professional development experiences, learning how to listen to and empathize with students, applying SEL with teachers, the future of teaching and learning, the problems with external influences on CS education, and so much more.
- Teaching AI in Elementary School with Charlotte Dungan
  - In this interview with Charlotte Dungan, we discuss Charlotte’s holistic approach to education, remotely teaching CS to rural communities, why Charlotte believes teaching is harder than working in industry, teaching AI in elementary school, the influence of money on research and practice, the future of work, and much more.
- The End of Programming
  - In this episode I unpack Welsh’s (2023) publication titled “The end of programming,” which asks when generative AI will replace the need for knowing how to program.
- Will ChatGPT Get You Caught? Rethinking of Plagiarism Detection
  - In this episode I unpack Khalil & Er’s (2023) publication titled “Will ChatGPT get you caught? Rethinking of plagiarism detection,” which explores how likely it is for plagiarism software to detect whether an essay was written by generative AI.
- More episodes related to AI
- All other episodes
More episodes related to AI