Analyzing GPT's Reference to External Data Sets

Analyzing GPT’s Reference to External Data Sets

Disclaimer: This post includes affiliate links

If you click on a link and make a purchase, I may receive a commission at no extra cost to you.

Quick Links

Key Takeaways

The plagiarism detected in an essay and a programming code generated by ChatGPT was relatively low, indicating that it doesn’t typically copy such content from online sources.
The plagiarism rate was a bit higher when calculating a mathematical solution. This might be because mathematical reasoning is usually similar across sources, so the responses can match other materials.
ChatGPT’s responses to questions about information publicly available on blogs showed higher plagiarism percentages.

MUO VIDEO OF THE DAY

SCROLL TO CONTINUE WITH CONTENT

While ChatGPT can answer any question you may have, some users wonder whether its responses contain plagiarism. To investigate this, we generated four different types of texts using ChatGPT and then evaluated their originality using various plagiarism detection tools.

What Exactly Is Plagiarism, and How Do LLMs Work?

To determine if ChatGPT is guilty of plagiarism, you should first understand what constitutes plagiarism. Plagiarism involves using another person’s words, ideas, or work without proper attribution. This includes directly copying text from a source without citation or closely paraphrasing someone else’s ideas without acknowledgment.

ChatGPT, like other Large Language Models (LLMs) , is trained on large datasets, mostly from publicly available content. However, collecting such vast amounts of data raises ethical questions, as the original creators haven’t consented to their work being used in training the LLMs. This leads to debates about the ethics and legality of such practices.

Although ChatGPT generates responses based on the prompts it receives, the issue lies in the broader context of how OpenAI (ChatGPT’s developer) obtained the data used to train it, which involves using content without proper consent. Many see this as plagiarism and, for many websites, content theft. However, pinpointing the exact sources of plagiarism is difficult.

For the remainder of this article, we’ll concentrate on whether ChatGPT plagiarizes its output from other sources without delving into the specifics of where its responses come from. Let’s check the originality of ChatGPT’s responses using various plagiarism detection tools to see whether the chatbot uses text from online sources directly.

Does ChatGPT Plagiarize Essays?

In this first example, we tasked ChatGPT with composing a 300-word essay on mental health issues.

Following that, we used various plagiarism detection tools to assess the originality of the essay generated by the chatbot. These tools included the Quetext plagiarism checker, Microsoft Word’s built-in plagiarism checker, Grammarly’s plagiarism checker, and the Duplichecker plagiarism scanner.

Microsoft’s built-in similarity checker reported zero percent similarity with online sources. The levels of plagiarism detected by other tools were also minimal: Grammarly’s plagiarism detector found four percent, QueText’s plagiarism detector found five percent, and Duplichecker’s plagiarism scanner showed zero percent.

Considering the small percentage of detected plagiarism, it appears that ChatGPT does not directly copy essays from existing sources.

Does ChatGPT Plagiarize Code?

To assess whether ChatGPT plagiarizes code, we tasked the chatbot with writing code for a calculator in Python.

Following this, we conducted a plagiarism check on the code using a specialized programming plagiarism checker called Dolos , which detected zero percent similarity. Also, when we checked the code using general text plagiarism detection tools mentioned above, the results were consistent, with almost none of the programs detecting more than four percent plagiarism.

When we prompted ChatGPT to generate a code for a calculator from different accounts, the responses appeared different. This observation and the results from the plagiarism checks indicate that ChatGPT doesn’t simply replicate codes from online sources. Instead, it draws upon the dataset it was trained on to generate code independently.

Does ChatGPT Plagiarize Mathematical Solutions?

During the third test, we tasked the chatbot to solve a mathematical problem and provide detailed reasoning for each step.

To check the originality of the response, we tested its output using several academic-specific plagiarism detection tools , including PapersOwl plagiarism checker, AI-powered Trinka plagiarism scanner, as well as general plagiarism checker tools such as Grammarly, Duplichecker, and QueText.

PapersOwl’s plagiarism detector indicated a nearly 46 percent similarity between the chatbot’s generated reasoning and online sources. Similarly, the Trinka plagiarism detector reported more than 10 percent similarity. Additionally, Grammarly’s plagiarism detector detected 14 percent similarity, QueText found 17 percent, and Duplichecker showed seven percent.

The detection of high plagiarism in the generated response doesn’t suggest that the chatbot directly copies reasoning for mathematical questions from online sources. This is mainly because solutions and reasoning for math problems are often standard and widely available online.

So, even though ChatGPT comes up with its own responses, finding the same answers and reasoning online is possible, which might have added to the high plagiarism percentages.

Does ChatGPT Use Content From Blogs?

To check whether ChatGPT uses content from online blogs, we asked the chatbot to provide tips for maintaining laptop battery health.

Microsoft Word detected 10 percent plagiarism in the generated text. Duplichecker showed four percent, Grammarly’s plagiarism checker indicated 14 percent, but Quetext found 58 percent plagiarism in the text. Upon digging further, some of the text in the chatbot’s response matched the content on some blogs.

To double-check if the high plagiarism detection wasn’t just coincidental, I asked the chatbot a few more questions about information that is easily available online. The plagiarism percentage in the generated responses was much higher. Based on our testing, it appears that the chatbot sometimes uses phrases and text from online sources, which is quite surprising.

Should You Use ChatGPT for Work or School?

Although many free online plagiarism checkers haven’t detected major plagiarism in ChatGPT’s responses, you shouldn’t use it for academic or professional purposes.

Don’t use ChatGPT for your school assignments if you’re a student . Instructors can use tools such as GPTZero and Turnitin’s AI writing detector to spot AI-generated content. If your work gets flagged as AI-generated by such tools, you could fail the assignment or even be expelled from school. Even though many GPT-detection tools specifically state that they shouldn’t be used for this purpose, they are, and it could land you in trouble. Not to mention, you’re really only cheating yourself by not studying the topic properly.

Can you use the chatbot to improve your work performance ? It depends. If you want to improve your writing flow in emails or other text forms, using AI can save you time and effort. However, you should only use it as a tool to assist you in your tasks rather than relying on it to do the entire job for you.

In contrast, if your job, like professional writing, prohibits using such tools, you should avoid using ChatGPT or any other tool altogether.

Hopefully, our testing has given you insight into the extent to which ChatGPT may draw from resources available on the web. However, it’s important to note that we’ve used free plagiarism tools and tested only a limited dataset. So, while our findings might be helpful, they shouldn’t be taken as absolute facts.

Tech Savvy