Will ChatGPT Get You Caught? Rethinking of Plagiarism Detection
In this episode I unpack Khalil & Er’s (2023) publication titled “Will ChatGPT get you caught? Rethinking of plagiarism detection,” which explores how likely it is for plagiarism software to detect whether an essay was written by generative AI.
-
With the relatively new increase in
popularity for generative AI one of the
interesting questions that revolves
around education and the education
system is what is the impact of
generative AI on plagiarism and
plagiarism detection we're actually
going to explore that topic in today's
paper which is titled will chat gbt get
you caught rethinking a plagiarism
detection this was written by Muhammad
Khalil and Khan air apologies if I
mispronounce any names here's the
abstract for the paper quote the rise of
artificial intelligence AI technology
and its impact on education has been a
topic of growing concern in recent years
the new generation AI systems such as
chatbots have become more accessible on
the internet and stronger in terms of
capabilities the use of chat Bots
particularly chat GPT for generating
academic essays at schools and colleges
has sparked fears among Scholars this
study aims to explore the originality of
contents produced by one of the most
popular AI chatbots chat GPT to this end
two popular plagiarism detection tools
were used to evaluate the originality of
various topics our results manifest that
chat GPT has a great potential to
generate sophisticated text outputs
without being well caught by the
plagiarism check software in other words
chat GPT can create content on many
topics with high originality as if they
are created by someone these findings
align with recent concerns about
students using chat Bots for an easy
shortcut to success with minimal or no
effort moreover chatgpt was asked to
verify if the essays were generated by
itself as an additional measure of
plagiarism check and it showed Superior
performance compared to the traditional
plagiarism detection tools paper
discusses the need for institutions to
consider appropriate measures to
mitigate potential plagiarism issues and
advise on the ongoing debate surrounding
the impact of AI technology on education
further implications are discussed in
the paper end quote Vita summarizes
paper into a single sentence I say that
this paper explores How likely it is for
plagiarism software to detect whether an
essay was written by generative AI now
this topic is brought enough that it's
relatively interesting for many
Educators in many different domains
however I'm going to talk about this
specifically in relation to computer
science and computer science education
in particular with code generation now
if you don't know who I am my name is
Jared O'Leary and I have a background
working with every single grade level
kindergarten through doctoral student in
music education and computer science
education context if you want to find
out more about my background just go to
my website it has my curriculum of the
day which Speaking of there are hundreds
if not thousands of free resources on my
website jaredelleary.com which has over
science Educators and a bunch of other
content on there for gamers and drummers
Etc because I like to create content so
check it out if you haven't already now
this paper is relatively short we're
going to get through it fairly quickly
however it raises some interesting
questions that I think Educators should
explore especially computer science
Educators so in the background of this
paper the authors talk about what are
chat bots in education What specifically
is chat GPT how do we engage in
detecting cheating and proctoring Etc
and then what is like a plagiarism
checks if you're interested in learning
more about that
feel free to check out this paper it is
available for free and I do link to it
in the show notes at jaredoleery.com but
in the second section of the methodology
the authors talk about how they
generated 50 essays on different topics
and so these were 500 word essays on a
particular topic these were generated by
chat GPT the essays were then sent to
two different plagiarism software so one
of them is turnitin and the other one is
authenticate 25 to the first one 25 to
the other one however they also took the
essays and they sent it to chat gbt and
asked hey was this created by AI or was
this written by a person and they did
that for all 50 of the essays which was
a great thing to test whether or not
generative AI could actually detect if
something else was created by generative
AI alright so let's talk about the
results here so the very first one is
results for the authenticate software
again there are 25 essays that were
submitted to authenticate so the results
here are based off of How likely it was
to detect whether or not these were
created by generative AI or rather
whether or not it was marked as
plagiarism here's a quote from page 8 of
the PDF quote the majority of the essays
n equals 17 68 were found to have a high
originality as they were barely similar
to other content less than 10 percent
some of the essays n of 25 20 had an
acceptable level of similarity ranging
from 10 to 20 percent only three essays
were reported to have very high
similarity 20 to 40 percent with other
content and none of the Articles were
found to have similarity score above 40
percent the average similarity score
across all essays was 8.76 from the
first result set it is clear that the
essays generated by chat GPT contained
highly original content and would not
face plagiarism issues if they were
student submissions for an assignment
end quote okay so they basically found
that the majority of the essays were not
similar to other forms of writing that
were detected by this particular system
so in other words it would bring it back
and say hey it's likely that whoever
submitted this did not plagiarize this
or lift this from somewhere else but
what about the second plagiarism
software so this one was a little bit
tired so this one is the software called
turnitin quote at first glance it is
evident that the similarity scores were
relatively higher among the second group
of essays to begin with nearly half of
the essays n equals 12 had a similarity
score of less than 10 percent and six
essays exhibited an acceptable level of
similarity with scores ranging from 10
to 20 percent in comparison to the first
result set where only three essays had
similarity scores between 20 to 40
percent a significant increase in
instances of lack of originality was
observed in the second set with six
essays displaying problematic similarity
scores additionally a striking case of
plagiarism was identified in one of the
essays as it displayed a high similarity
score of over 40 percent with other
existing content the average similarity
score among all essays was found to be
initial results set 8.76 end quote
investor also from page eight now while
this is like certainly better than the
authenticate software for these 50
different essays it's still likely to
pass the plagiarism test so students who
use chat GPT could have submitted these
it detected by both of these softwares
okay but what about when it was a
reverse engineered with chat GPT itself
so asking it hey was this text generated
by a chatbot quote with an accuracy of
over 92 percent the chat GPT was able to
detect if the written essays was
generated by itself out of 50 essays
chat GPT identified 46 as being
plagiarized with four remaining
undetected as instances of plagiarism
end quote so in other words 46 out of
the 50 were labeled as plagiarized or as
written by a generative AI so it
performs significantly better than the
plagiarism checks plagiarism is the idea
that you are lifting something from
somebody else without giving credit to
them sometimes it's in the form of an
overt I am literally copying and pasting
what somebody else wrote and just
putting it in here without citing it
other times it might be I'm paraphrasing
the these ideas and I'm putting into my
own words but I'm not necessarily
pointing back to the original source to
say hey here's where I got this idea
from it appears as though chat GPT is
not directly lifting from other sources
by literally just copying pasting ideas
from one source and putting it into
another without actually citing it
instead what it is likely doing is
paraphrasing from other forms of text a
lot of different forms of text and kind
of synthesizing and summarizing and
putting it into something new where this
becomes a problem is it's obviously not
pointing back to where it got this from
so instead of embedding like in-text
citations or things like that into it it
might be making it as though the person
who submits this is coming up with this
on their own these are our original
ideas but this leads to a question of is
this a form of plagiarism or just kind
of like the new version of getting
someone else to write the essay for you
let's say that the next version of chat
GPT like five or even six or whatever is
able to add in references and citations
is that actually a form of plagiarism or
is it an entirely or different question
that more relay related to what people
have previously done where they will
literally pay someone to write an essay
for them having taught at higher
education institutions I know that there
are several professors that I've worked
with who have talked about how they
would sometimes have students like
submit in just like terrible work
terrible work terrible work terrible
work they get to the final end and it's
like holy cow like this could be
submitted for publication immediately it
is phenomenal and because they had a
long history of just really bad work and
then suddenly it was amazing it was like
okay clearly you hired somebody to write
this because when they asked the student
come in and explain like what did you
mean here by this particular very dense
paragraph the student's not able to
actually articulate it okay clearly you
don't understand what you even wrote or
claimed to have written so odds are you
paid somebody else to do that so let's
talk about academic integrity and how
that might impact whether or not you
actually finish this degree this might
be a new form of that in that students
like if they are able to add in
citations into the generated text now
it's no longer a form of plagiarism but
it's still a form of academic dishonesty
and a lack of Integrity so it's
something that we as Educators really
need to kind of think through and
prepare for but speaking of the authors
give on page 12 some recommendations for
educators as well as recommendations for
students and institutions so the first
recommendation that they give to
teachers is to go beyond the basics and
actually have active engagement that
requires critical thinking I totally
agree we shouldn't be wasting our time
all doing the exact same thing and
having the same outcome instead we
should be engaging in projects that
actually have an impact on our community
or the world outside of the classroom
instead of like everybody writing the
same essay on the same topic and the
teacher's just like grading for grammar
and whatnot that sounds terrible from a
computer science perspective it's the
same thing like we shouldn't all be
creating the exact same like calculator
app how about instead we create
different games and apps and other
things that are actually interesting to
the students that are working with it
instead of everybody recreating the
exact same thing that has one single
solution another thing that the authors
recommend is that teachers could
actually talk to some of the students
about the limitations of chat gbt and
and some of the consequences for
actually relying on it this consequences
in terms of like not actually knowing
the material like if you're studying
this let's say for higher education or
something and you're trying to get a
degree in a particular topic and you
rely entirely on generative AI to figure
out things when you get to a job
interview and they ask like about your
subject area expertise and you're not
able to articulate that it doesn't
matter how good your like academic
qualifications are on your CV they're
going to look at you and go you clearly
don't know what you're talking about
sorry you're not going to get hired but
there's also the issue of again academic
Integrity like if you actually go to an
institution and submit plagiarized work
or get somebody else to do the work for
you or are just dishonest in some way
then you could very much so get kicked
out of the institution I've had to have
that conversation with a couple of
students where it's like hey you know if
I were to actually go through the list
of steps I'm supposed to take when you
do this very specific thing you'd be
kicked out of this University so
students need to be aware of like yeah
this could save you time and it'll save
even more time in the long run because
you won't have any homework to do
because you won't be allowed back in the
institution and maybe that's something
they actually want but probably not so
it's good for students to know that
which relates to the third thing which
is like you really gotta focus on
academic Integrity with the students
that you're working with and that's
something that you can put in the
syllabi that you create okay not for
students and whatnot so one of the
things that the authors recommend is to
take advantage of this technology to
improve your own learning your own
competencies Etc but don't use this as a
substitute for original thinking writing
creating Etc use it to maybe generate
some ideas of potential essays or
programs that you could create and then
actually you know do the work to do that
and then they advise students to again
focus on Academia and integrity now from
an Institutional standpoint the authors
recommend that institutions actually get
more familiar with the potentials for
large language models being used
generative AI Etc this is something that
teachers K through doctoral should
really start focusing on now because
this is now a thing so they recommend
that institutions create and Implement
very clear policies and guidelines
because this is now a part of our
reality in the education system and one
thing that you could also do is to offer
training for like various students
faculty staff Etc about not only
academic Integrity but also like how to
use AI responsibly in the classroom like
as a tool for education or for learning
which we'll actually talk about in next
week's episode so there's a little
teaser for you now at the end of these
unpacking scholarship episodes I'd like
to share some of my lingering questions
and thoughts when reading through these
particular papers so one of the
questions that I have is how are you
using or planning on using generative AI
to help you teach or to help students
learn there are so many content creators
out there right now talking about how
they're using generative AI in their
real estate portfolio or to find a great
vacation destination and things that you
could do in that destination or to learn
something new or to help them get out of
a parking ticket whatever like there's
many different uses that people are
talking about online some of which are
related to education like some educators
are actually using generative AI to come
up with writing prompts or lesson plans
Etc so if you're currently using it I'm
genuinely curious how and if you're not
using a better planning on doing doing
it what are your plans you can respond
on the show notes at jaredelary.com or
even in the YouTube comments be really
interesting to see how people are using
AI or planning on using it in the
education system but another question
that I have is what concerns do you have
about generative Ai and education and
what excites you about it there's a lot
of really cool things that can be done
with generative Ai and education but
there's also some pretty scary things
that some educators are pretty concerned
about again you can post on the show
notes or you can post on the website
jaredelery.com or you can post in the
YouTube comment section and I'm happy to
talk about these and kind of share some
of the comments that you leave in those
different places in upcoming episodes
because we're going to be talking about
AI for a little bit for the next couple
of episodes but a little bit more of a
deeper question that I have is where is
the line for you with plagiarism when
using generative AI versus other sources
like stack overflow when it comes to
programming so for those of you who
aren't unfamiliar with it stack Overflow
is kind of like a Ask Jeeves for
programming so you can go to this
website you can ask a very specific
programming question maybe even putting
your own code and then other people
respond to it and give like here's how
to do X Y or Z or here's how to fix that
problem that you're having with X Y and
Z so often programmers and including
myself would go there whenever they're
stuck using a new IDE or a new language
or whatever and are trying to figure out
how do I do this very specific thing you
go into stack Overflow and you can look
at it and just copy it and then paste it
into your program maybe change a few of
the parameters a few of the variables
some of the function names Etc but in
general I'll leave a large portion of
that intact and then voila you have a
functioning program or just completely
broken made things even worse but hey
Version Control so when it comes to
something like that specifically with
programming at what point do you think
that it's plagiarism when you are using
stack Overflow or when you're using
generative AI or is it both or neither
is that just simply part of the process
and what you consider to be part of
being a programmer I certainly know that
there's a lot of people who feel that
way I mean there's a ton of memes about
it but another question that I have
that's a follow-up is how does that
compare with your thoughts on other
forms of play plagiarism so plagiarizing
an essay or a song so for example if
somebody were to directly lift like an
entire paragraph from an essay or from a
paper or whatever and then put it into
their own that's clearly a form of
plagiarism if you're not citing that
work but when you take an entire
function or like set of code could be
pages long and put it into your own
program that's generally considered
acceptable because it solves a
particular problem there might only be
one way to solve that with that
particular language in that particular
IDE depends on how specific and complex
the problem is so even if you hadn't
copy and pasted from somebody else you
would have ended up with very similar or
the exact same code so again is that
plagiarism for you now when answering
these questions were you focusing on the
process or the product and how does
thinking of either of those actually
impact your answer now that I've kind of
pointed it out the process versus
product dilemma that is often discussed
in education if I ask you to focus on
the process does that change your answer
versus if I ask you to focus on the
product does that change your answer so
for example if you focus on process in
general which is something that I
certainly focused on in the K8 coding
classes that I worked with do you care
more about the effort expended to create
a product but if you focus on the
product do you care more about how
efficiently time was spent creating that
thing from an entrepreneurial standpoint
we might look at this and go this
student who used chat GPT is being more
efficient with their time and they're
potentially going to be able to start
several businesses or whatever or create
several different apps and the amount of
time we'll be taking somebody to from
beginning to end with a blank slate
create an entire app a single one as
opposed to several using generative AI
in a school context we might punish the
person who used generative Ai and reward
the person who spent all their time
starting with a blank page from start to
finish and created everything from
scratch without using any kind of hints
but an entrepreneurial side of things
from a business side we would likely
reward the person who creates several
different apps the odds of them being
able to make more money off of the
several different apps versus one app is
something that we need to consider so
how we set up our classes to either
reward or punish one form of Engagement
over another can make it so that some
students are going to really thrive in a
classroom context or really Thrive
outside of the classroom context in
other words are they likely going to
want to leave the formal education
system because they know they will be
rewarded outside of that system if they
were to actually just go out on their
own and start their own business
ventures these are things we have to
consider as Educators so we can't just
focus on like what students are learning
but also the social context outside of
the classroom it doesn't mean that you
have to change your practices in order
to match what is going on in a
capitalistic culture but instead to be
able to acknowledge it and discuss it
and talk about well even though you
understand how that will be rewarded
outside of the classroom students need
to be able to understand why it is
important to engage in this process in
another way but if you don't have that
dialogue the student's going to look at
this and go what's the point this is
wasting my time this is busy work why
would I focus all my time doing this one
thing when I can literally like get
rewarded for this outside of this
formalized context I should just quit
this class and go do this thing and
start my own company venture or whatever
these are just my own ramblings on
generative Ai and whatnot but we're
going to talk about this more in the
upcoming episodes and it's something
that we've actually talked about in
other episodes for example on episode 13
AI for all curriculum development and
gender discourse with Sarah Judd in
episode 142 teaching AI in elementary
school with Charlotte Dungan episode 173
empathetic listening and computer
science with Josh Sheldon and in episode
these in the show notes at
jaredelary.com so if you haven't
listened to those episodes check out
this interview and there's a bunch more
that are really cool as well as a bunch
more of these unpacking scholarship
episodes if you enjoyed this episode
consider sharing with somebody else or
leaving a review or simply a comment or
a like on YouTube it just helps more
people find it stay tuned next week for
another episode that kind of dives into
this topic a little bit more with a
study that kind of Compares well what
happens when students use generative AI
to learn how to code compared to
students who do not use it do they learn
as much or did they learn less or is it
the same you'll find out next week until
then I'll help you all staying safe and
are having a wonderful week
Article
Khalil, M. & Er, E. (2023). Will ChatGPT get you caught? Rethinking of plagiarism detection. arXiv.
Abstract
“The rise of Artificial Intelligence (AI) technology and its impact on education has been a topic of growing concern in recent years. The new generation AI systems such as chatbots have become more accessible on the Internet and stronger in terms of capabilities. The use of chatbots, particularly ChatGPT, for generating academic essays at schools and colleges has sparked fears among scholars. This study aims to explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT. To this end, two popular plagiarism detection tools were used to evaluate the originality of 50 essays generated by ChatGPT on various topics. Our results manifest that ChatGPT has a great po-tential to generate sophisticated text outputs without being well caught by the plagiarism check software. In other words, ChatGPT can create content on many topics with high originality as if they were written by someone. These findings align with the recent concerns about students using chatbots for an easy shortcut to success with minimal or no effort. Moreover, ChatGPT was asked to verify if the essays were generated by itself, as an additional measure of plagiarism check, and it showed superior performance compared to the traditional plagiarism-detection tools. The paper discusses the need for institutions to consider appropriate measures to mitigate potential plagiarism issues and advise on the ongoing debate surrounding the impact of AI technology on education. Further implications are discussed in the paper.”
Author Keywords
Education, Chatbots, AI, ChatGPT, Plagiarism, Essays, Cheating
My One Sentence Summary
This paper explores how likely it is for plagiarism software to detect whether an essay was written by generative AI.
Some Of My Lingering Questions/Thoughts
Is generative AI a form of plagiarism or a a new form of getting someone else to write the essay for you?
How are you using (or planning on using) generative AI to help you teach or to help students learn?
What concerns do you have about generative AI in education?
What excites you about it?
Where is the line for you with plagiarism when using generative AI vs other sources (e.g., Stack Overflow) with programming?
How does that compare with your thoughts on other forms of plagiarism (e.g., plagiarising an essay or a song)?
When answering these questions, were you focusing on the process or the product?
How does thinking of either of those impact your answer?
Resources/Links Relevant to This Episode
Other podcast episodes that were mentioned or are relevant to this episode
AI4ALL, Curriculum Development, and Gender Discourse with Sarah Judd
In this interview with Sarah Judd, we discuss what Sarah learned both in the classroom and as a CS curriculum writer, the curriculum Sarah continues to develop for AI4ALL, advice and philosophies that can guide facilitating a class and designing curriculum, some of our concerns with discourse on gender in CS, my recommended approach to sustainable professional development, and much more.
Empathetic Listening in Computer Science with Josh Sheldon
In this interview with Josh Sheldon, we discuss computational action, designing exploratory professional development experiences, learning how to listen to and empathize with students, applying SEL with teachers, the future of teaching and learning, the problems with external influences on CS education, and so much more.
Teaching AI in Elementary School with Charlotte Dungan
In this interview with Charlotte Dungan, we discuss Charlotte’s holistic approach to education, remotely teaching CS to rural communities, why Charlotte believes teaching is harder than working in industry, teaching AI in elementary school, the influence of money on research and practice, the future of work, and much more.
In this episode I unpack Welsh’s (2023) publication titled “The end of programming,” which asks when generative AI will replace the need for knowing how to program.
Find other CS educators and resources by using the #CSK8 hashtag on Twitter