Will ChatGPT Get You Caught? Rethinking of Plagiarism Detection

In this episode I unpack Khalil & Er’s (2023) publication titled “Will ChatGPT get you caught? Rethinking of plagiarism detection,” which explores how likely it is for plagiarism software to detect whether an essay was written by generative AI.

  • With the relatively new increase in

    popularity for generative AI one of the

    interesting questions that revolves

    around education and the education

    system is what is the impact of

    generative AI on plagiarism and

    plagiarism detection we're actually

    going to explore that topic in today's

    paper which is titled will chat gbt get

    you caught rethinking a plagiarism

    detection this was written by Muhammad

    Khalil and Khan air apologies if I

    mispronounce any names here's the

    abstract for the paper quote the rise of

    artificial intelligence AI technology

    and its impact on education has been a

    topic of growing concern in recent years

    the new generation AI systems such as

    chatbots have become more accessible on

    the internet and stronger in terms of

    capabilities the use of chat Bots

    particularly chat GPT for generating

    academic essays at schools and colleges

    has sparked fears among Scholars this

    study aims to explore the originality of

    contents produced by one of the most

    popular AI chatbots chat GPT to this end

    two popular plagiarism detection tools

    were used to evaluate the originality of

    various topics our results manifest that

    chat GPT has a great potential to

    generate sophisticated text outputs

    without being well caught by the

    plagiarism check software in other words

    chat GPT can create content on many

    topics with high originality as if they

    are created by someone these findings

    align with recent concerns about

    students using chat Bots for an easy

    shortcut to success with minimal or no

    effort moreover chatgpt was asked to

    verify if the essays were generated by

    itself as an additional measure of

    plagiarism check and it showed Superior

    performance compared to the traditional

    plagiarism detection tools paper

    discusses the need for institutions to

    consider appropriate measures to

    mitigate potential plagiarism issues and

    advise on the ongoing debate surrounding

    the impact of AI technology on education

    further implications are discussed in

    the paper end quote Vita summarizes

    paper into a single sentence I say that

    this paper explores How likely it is for

    plagiarism software to detect whether an

    essay was written by generative AI now

    this topic is brought enough that it's

    relatively interesting for many

    Educators in many different domains

    however I'm going to talk about this

    specifically in relation to computer

    science and computer science education

    in particular with code generation now

    if you don't know who I am my name is

    Jared O'Leary and I have a background

    working with every single grade level

    kindergarten through doctoral student in

    music education and computer science

    education context if you want to find

    out more about my background just go to

    my website it has my curriculum of the

    day which Speaking of there are hundreds

    if not thousands of free resources on my

    website jaredelleary.com which has over

    science Educators and a bunch of other

    content on there for gamers and drummers

    Etc because I like to create content so

    check it out if you haven't already now

    this paper is relatively short we're

    going to get through it fairly quickly

    however it raises some interesting

    questions that I think Educators should

    explore especially computer science

    Educators so in the background of this

    paper the authors talk about what are

    chat bots in education What specifically

    is chat GPT how do we engage in

    detecting cheating and proctoring Etc

    and then what is like a plagiarism

    checks if you're interested in learning

    more about that

    feel free to check out this paper it is

    available for free and I do link to it

    in the show notes at jaredoleery.com but

    in the second section of the methodology

    the authors talk about how they

    generated 50 essays on different topics

    and so these were 500 word essays on a

    particular topic these were generated by

    chat GPT the essays were then sent to

    two different plagiarism software so one

    of them is turnitin and the other one is

    authenticate 25 to the first one 25 to

    the other one however they also took the

    essays and they sent it to chat gbt and

    asked hey was this created by AI or was

    this written by a person and they did

    that for all 50 of the essays which was

    a great thing to test whether or not

    generative AI could actually detect if

    something else was created by generative

    AI alright so let's talk about the

    results here so the very first one is

    results for the authenticate software

    again there are 25 essays that were

    submitted to authenticate so the results

    here are based off of How likely it was

    to detect whether or not these were

    created by generative AI or rather

    whether or not it was marked as

    plagiarism here's a quote from page 8 of

    the PDF quote the majority of the essays

    n equals 17 68 were found to have a high

    originality as they were barely similar

    to other content less than 10 percent

    some of the essays n of 25 20 had an

    acceptable level of similarity ranging

    from 10 to 20 percent only three essays

    were reported to have very high

    similarity 20 to 40 percent with other

    content and none of the Articles were

    found to have similarity score above 40

    percent the average similarity score

    across all essays was 8.76 from the

    first result set it is clear that the

    essays generated by chat GPT contained

    highly original content and would not

    face plagiarism issues if they were

    student submissions for an assignment

    end quote okay so they basically found

    that the majority of the essays were not

    similar to other forms of writing that

    were detected by this particular system

    so in other words it would bring it back

    and say hey it's likely that whoever

    submitted this did not plagiarize this

    or lift this from somewhere else but

    what about the second plagiarism

    software so this one was a little bit

    tired so this one is the software called

    turnitin quote at first glance it is

    evident that the similarity scores were

    relatively higher among the second group

    of essays to begin with nearly half of

    the essays n equals 12 had a similarity

    score of less than 10 percent and six

    essays exhibited an acceptable level of

    similarity with scores ranging from 10

    to 20 percent in comparison to the first

    result set where only three essays had

    similarity scores between 20 to 40

    percent a significant increase in

    instances of lack of originality was

    observed in the second set with six

    essays displaying problematic similarity

    scores additionally a striking case of

    plagiarism was identified in one of the

    essays as it displayed a high similarity

    score of over 40 percent with other

    existing content the average similarity

    score among all essays was found to be

    initial results set 8.76 end quote

    investor also from page eight now while

    this is like certainly better than the

    authenticate software for these 50

    different essays it's still likely to

    pass the plagiarism test so students who

    use chat GPT could have submitted these

    it detected by both of these softwares

    okay but what about when it was a

    reverse engineered with chat GPT itself

    so asking it hey was this text generated

    by a chatbot quote with an accuracy of

    over 92 percent the chat GPT was able to

    detect if the written essays was

    generated by itself out of 50 essays

    chat GPT identified 46 as being

    plagiarized with four remaining

    undetected as instances of plagiarism

    end quote so in other words 46 out of

    the 50 were labeled as plagiarized or as

    written by a generative AI so it

    performs significantly better than the

    plagiarism checks plagiarism is the idea

    that you are lifting something from

    somebody else without giving credit to

    them sometimes it's in the form of an

    overt I am literally copying and pasting

    what somebody else wrote and just

    putting it in here without citing it

    other times it might be I'm paraphrasing

    the these ideas and I'm putting into my

    own words but I'm not necessarily

    pointing back to the original source to

    say hey here's where I got this idea

    from it appears as though chat GPT is

    not directly lifting from other sources

    by literally just copying pasting ideas

    from one source and putting it into

    another without actually citing it

    instead what it is likely doing is

    paraphrasing from other forms of text a

    lot of different forms of text and kind

    of synthesizing and summarizing and

    putting it into something new where this

    becomes a problem is it's obviously not

    pointing back to where it got this from

    so instead of embedding like in-text

    citations or things like that into it it

    might be making it as though the person

    who submits this is coming up with this

    on their own these are our original

    ideas but this leads to a question of is

    this a form of plagiarism or just kind

    of like the new version of getting

    someone else to write the essay for you

    let's say that the next version of chat

    GPT like five or even six or whatever is

    able to add in references and citations

    is that actually a form of plagiarism or

    is it an entirely or different question

    that more relay related to what people

    have previously done where they will

    literally pay someone to write an essay

    for them having taught at higher

    education institutions I know that there

    are several professors that I've worked

    with who have talked about how they

    would sometimes have students like

    submit in just like terrible work

    terrible work terrible work terrible

    work they get to the final end and it's

    like holy cow like this could be

    submitted for publication immediately it

    is phenomenal and because they had a

    long history of just really bad work and

    then suddenly it was amazing it was like

    okay clearly you hired somebody to write

    this because when they asked the student

    come in and explain like what did you

    mean here by this particular very dense

    paragraph the student's not able to

    actually articulate it okay clearly you

    don't understand what you even wrote or

    claimed to have written so odds are you

    paid somebody else to do that so let's

    talk about academic integrity and how

    that might impact whether or not you

    actually finish this degree this might

    be a new form of that in that students

    like if they are able to add in

    citations into the generated text now

    it's no longer a form of plagiarism but

    it's still a form of academic dishonesty

    and a lack of Integrity so it's

    something that we as Educators really

    need to kind of think through and

    prepare for but speaking of the authors

    give on page 12 some recommendations for

    educators as well as recommendations for

    students and institutions so the first

    recommendation that they give to

    teachers is to go beyond the basics and

    actually have active engagement that

    requires critical thinking I totally

    agree we shouldn't be wasting our time

    all doing the exact same thing and

    having the same outcome instead we

    should be engaging in projects that

    actually have an impact on our community

    or the world outside of the classroom

    instead of like everybody writing the

    same essay on the same topic and the

    teacher's just like grading for grammar

    and whatnot that sounds terrible from a

    computer science perspective it's the

    same thing like we shouldn't all be

    creating the exact same like calculator

    app how about instead we create

    different games and apps and other

    things that are actually interesting to

    the students that are working with it

    instead of everybody recreating the

    exact same thing that has one single

    solution another thing that the authors

    recommend is that teachers could

    actually talk to some of the students

    about the limitations of chat gbt and

    and some of the consequences for

    actually relying on it this consequences

    in terms of like not actually knowing

    the material like if you're studying

    this let's say for higher education or

    something and you're trying to get a

    degree in a particular topic and you

    rely entirely on generative AI to figure

    out things when you get to a job

    interview and they ask like about your

    subject area expertise and you're not

    able to articulate that it doesn't

    matter how good your like academic

    qualifications are on your CV they're

    going to look at you and go you clearly

    don't know what you're talking about

    sorry you're not going to get hired but

    there's also the issue of again academic

    Integrity like if you actually go to an

    institution and submit plagiarized work

    or get somebody else to do the work for

    you or are just dishonest in some way

    then you could very much so get kicked

    out of the institution I've had to have

    that conversation with a couple of

    students where it's like hey you know if

    I were to actually go through the list

    of steps I'm supposed to take when you

    do this very specific thing you'd be

    kicked out of this University so

    students need to be aware of like yeah

    this could save you time and it'll save

    even more time in the long run because

    you won't have any homework to do

    because you won't be allowed back in the

    institution and maybe that's something

    they actually want but probably not so

    it's good for students to know that

    which relates to the third thing which

    is like you really gotta focus on

    academic Integrity with the students

    that you're working with and that's

    something that you can put in the

    syllabi that you create okay not for

    students and whatnot so one of the

    things that the authors recommend is to

    take advantage of this technology to

    improve your own learning your own

    competencies Etc but don't use this as a

    substitute for original thinking writing

    creating Etc use it to maybe generate

    some ideas of potential essays or

    programs that you could create and then

    actually you know do the work to do that

    and then they advise students to again

    focus on Academia and integrity now from

    an Institutional standpoint the authors

    recommend that institutions actually get

    more familiar with the potentials for

    large language models being used

    generative AI Etc this is something that

    teachers K through doctoral should

    really start focusing on now because

    this is now a thing so they recommend

    that institutions create and Implement

    very clear policies and guidelines

    because this is now a part of our

    reality in the education system and one

    thing that you could also do is to offer

    training for like various students

    faculty staff Etc about not only

    academic Integrity but also like how to

    use AI responsibly in the classroom like

    as a tool for education or for learning

    which we'll actually talk about in next

    week's episode so there's a little

    teaser for you now at the end of these

    unpacking scholarship episodes I'd like

    to share some of my lingering questions

    and thoughts when reading through these

    particular papers so one of the

    questions that I have is how are you

    using or planning on using generative AI

    to help you teach or to help students

    learn there are so many content creators

    out there right now talking about how

    they're using generative AI in their

    real estate portfolio or to find a great

    vacation destination and things that you

    could do in that destination or to learn

    something new or to help them get out of

    a parking ticket whatever like there's

    many different uses that people are

    talking about online some of which are

    related to education like some educators

    are actually using generative AI to come

    up with writing prompts or lesson plans

    Etc so if you're currently using it I'm

    genuinely curious how and if you're not

    using a better planning on doing doing

    it what are your plans you can respond

    on the show notes at jaredelary.com or

    even in the YouTube comments be really

    interesting to see how people are using

    AI or planning on using it in the

    education system but another question

    that I have is what concerns do you have

    about generative Ai and education and

    what excites you about it there's a lot

    of really cool things that can be done

    with generative Ai and education but

    there's also some pretty scary things

    that some educators are pretty concerned

    about again you can post on the show

    notes or you can post on the website

    jaredelery.com or you can post in the

    YouTube comment section and I'm happy to

    talk about these and kind of share some

    of the comments that you leave in those

    different places in upcoming episodes

    because we're going to be talking about

    AI for a little bit for the next couple

    of episodes but a little bit more of a

    deeper question that I have is where is

    the line for you with plagiarism when

    using generative AI versus other sources

    like stack overflow when it comes to

    programming so for those of you who

    aren't unfamiliar with it stack Overflow

    is kind of like a Ask Jeeves for

    programming so you can go to this

    website you can ask a very specific

    programming question maybe even putting

    your own code and then other people

    respond to it and give like here's how

    to do X Y or Z or here's how to fix that

    problem that you're having with X Y and

    Z so often programmers and including

    myself would go there whenever they're

    stuck using a new IDE or a new language

    or whatever and are trying to figure out

    how do I do this very specific thing you

    go into stack Overflow and you can look

    at it and just copy it and then paste it

    into your program maybe change a few of

    the parameters a few of the variables

    some of the function names Etc but in

    general I'll leave a large portion of

    that intact and then voila you have a

    functioning program or just completely

    broken made things even worse but hey

    Version Control so when it comes to

    something like that specifically with

    programming at what point do you think

    that it's plagiarism when you are using

    stack Overflow or when you're using

    generative AI or is it both or neither

    is that just simply part of the process

    and what you consider to be part of

    being a programmer I certainly know that

    there's a lot of people who feel that

    way I mean there's a ton of memes about

    it but another question that I have

    that's a follow-up is how does that

    compare with your thoughts on other

    forms of play plagiarism so plagiarizing

    an essay or a song so for example if

    somebody were to directly lift like an

    entire paragraph from an essay or from a

    paper or whatever and then put it into

    their own that's clearly a form of

    plagiarism if you're not citing that

    work but when you take an entire

    function or like set of code could be

    pages long and put it into your own

    program that's generally considered

    acceptable because it solves a

    particular problem there might only be

    one way to solve that with that

    particular language in that particular

    IDE depends on how specific and complex

    the problem is so even if you hadn't

    copy and pasted from somebody else you

    would have ended up with very similar or

    the exact same code so again is that

    plagiarism for you now when answering

    these questions were you focusing on the

    process or the product and how does

    thinking of either of those actually

    impact your answer now that I've kind of

    pointed it out the process versus

    product dilemma that is often discussed

    in education if I ask you to focus on

    the process does that change your answer

    versus if I ask you to focus on the

    product does that change your answer so

    for example if you focus on process in

    general which is something that I

    certainly focused on in the K8 coding

    classes that I worked with do you care

    more about the effort expended to create

    a product but if you focus on the

    product do you care more about how

    efficiently time was spent creating that

    thing from an entrepreneurial standpoint

    we might look at this and go this

    student who used chat GPT is being more

    efficient with their time and they're

    potentially going to be able to start

    several businesses or whatever or create

    several different apps and the amount of

    time we'll be taking somebody to from

    beginning to end with a blank slate

    create an entire app a single one as

    opposed to several using generative AI

    in a school context we might punish the

    person who used generative Ai and reward

    the person who spent all their time

    starting with a blank page from start to

    finish and created everything from

    scratch without using any kind of hints

    but an entrepreneurial side of things

    from a business side we would likely

    reward the person who creates several

    different apps the odds of them being

    able to make more money off of the

    several different apps versus one app is

    something that we need to consider so

    how we set up our classes to either

    reward or punish one form of Engagement

    over another can make it so that some

    students are going to really thrive in a

    classroom context or really Thrive

    outside of the classroom context in

    other words are they likely going to

    want to leave the formal education

    system because they know they will be

    rewarded outside of that system if they

    were to actually just go out on their

    own and start their own business

    ventures these are things we have to

    consider as Educators so we can't just

    focus on like what students are learning

    but also the social context outside of

    the classroom it doesn't mean that you

    have to change your practices in order

    to match what is going on in a

    capitalistic culture but instead to be

    able to acknowledge it and discuss it

    and talk about well even though you

    understand how that will be rewarded

    outside of the classroom students need

    to be able to understand why it is

    important to engage in this process in

    another way but if you don't have that

    dialogue the student's going to look at

    this and go what's the point this is

    wasting my time this is busy work why

    would I focus all my time doing this one

    thing when I can literally like get

    rewarded for this outside of this

    formalized context I should just quit

    this class and go do this thing and

    start my own company venture or whatever

    these are just my own ramblings on

    generative Ai and whatnot but we're

    going to talk about this more in the

    upcoming episodes and it's something

    that we've actually talked about in

    other episodes for example on episode 13

    AI for all curriculum development and

    gender discourse with Sarah Judd in

    episode 142 teaching AI in elementary

    school with Charlotte Dungan episode 173

    empathetic listening and computer

    science with Josh Sheldon and in episode

    these in the show notes at

    jaredelary.com so if you haven't

    listened to those episodes check out

    this interview and there's a bunch more

    that are really cool as well as a bunch

    more of these unpacking scholarship

    episodes if you enjoyed this episode

    consider sharing with somebody else or

    leaving a review or simply a comment or

    a like on YouTube it just helps more

    people find it stay tuned next week for

    another episode that kind of dives into

    this topic a little bit more with a

    study that kind of Compares well what

    happens when students use generative AI

    to learn how to code compared to

    students who do not use it do they learn

    as much or did they learn less or is it

    the same you'll find out next week until

    then I'll help you all staying safe and

    are having a wonderful week


Abstract

“The rise of Artificial Intelligence (AI) technology and its impact on education has been a topic of growing concern in recent years. The new generation AI systems such as chatbots have become more accessible on the Internet and stronger in terms of capabilities. The use of chatbots, particularly ChatGPT, for generating academic essays at schools and colleges has sparked fears among scholars. This study aims to explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT. To this end, two popular plagiarism detection tools were used to evaluate the originality of 50 essays generated by ChatGPT on various topics. Our results manifest that ChatGPT has a great po-tential to generate sophisticated text outputs without being well caught by the plagiarism check software. In other words, ChatGPT can create content on many topics with high originality as if they were written by someone. These findings align with the recent concerns about students using chatbots for an easy shortcut to success with minimal or no effort. Moreover, ChatGPT was asked to verify if the essays were generated by itself, as an additional measure of plagiarism check, and it showed superior performance compared to the traditional plagiarism-detection tools. The paper discusses the need for institutions to consider appropriate measures to mitigate potential plagiarism issues and advise on the ongoing debate surrounding the impact of AI technology on education. Further implications are discussed in the paper.”


Author Keywords

Education, Chatbots, AI, ChatGPT, Plagiarism, Essays, Cheating


My One Sentence Summary

This paper explores how likely it is for plagiarism software to detect whether an essay was written by generative AI.


Some Of My Lingering Questions/Thoughts

  • Is generative AI a form of plagiarism or a a new form of getting someone else to write the essay for you?

  • How are you using (or planning on using) generative AI to help you teach or to help students learn?

  • What concerns do you have about generative AI in education?

    • What excites you about it?

  • Where is the line for you with plagiarism when using generative AI vs other sources (e.g., Stack Overflow) with programming?

    • How does that compare with your thoughts on other forms of plagiarism (e.g., plagiarising an essay or a song)?

    • When answering these questions, were you focusing on the process or the product?

    • How does thinking of either of those impact your answer?


Resources/Links Relevant to This Episode



More Content