Movie Website Ownership Site Owners and Leaders

How to Protect your Website and Intellectual Property from Article Web Scrapers and Plagiarism

Woman Hand Up Stop Black And White

How to Defend your Website and Intellectual Property from Article Scrapers and Plagiarism

No matter how it happens, it will happen. Your website and its content will get noticed by the wrong people on the Internet. Those wrong people will want to use the intellectual property on your website, namely your articles, without your permission. This has been happening to one of the websites that I run for over a year now. I wanted to tell you what I did about it, didn’t do, and how you can combat this situation, if you ever encounter it, in the future.

What is Web Scraping?

Web scraping is the process of using bots to extract content and data from a website.

Web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere.

Web scraping is also used for illegal purposes, including…the theft of copyrighted content.

What is Plagiarism?

Plagiarism is the practice of taking someone else’s work or ideas and passing them off as one’s own.

What is a Spinner?

Article spinning is a writing technique used in search engine optimization (SEO), and other applications, which creates what appears to be new content from what already exists. Content spinning works by replacing specific words, phrases, sentences, or even entire paragraphs with any number of alternate versions to provide a slightly different variation with each spin – also known as Rogeting.

Scrap and Plagiarism Examples

Now that we have established what web scraping (scrapers or being scraped), plagiarism, and a spinner are, here are five examples of content from one of websites being scraped by other websites:

1.) Revue de télévision: OUTLANDER: Saison 5, épisode 7: La ballade de Roger Mac [Starz]

2.) Revue de télévision: TITANS: Saison 2, épisode 1: Trigon – Une trahison de la confiance, une insulte et un gaspillage de ce qui est arrivé avant [DC Universe]

3.) CW Releases 2019/2020 TV Shows Lineup: Batwoman, Arrow, The Flash, Supergirl And Others

4.) Blacklist: Season 8, Episode 19: Balthazar “Vino” Baker Plot Overview and Broadcast Date [NBC]

5.) Film Review: LOVE AND MONSTERS (2020): Michael Matthews’ Movie Lacks Love and Memorable Monsters | Gossip News

The first two examples belong to the same website. They were so brazen they copied everything, including the title, header image of the article, and all of the links, including the internal links (back-links) that I placed into the article. Every time I posted an article that mentioned “Game of Thrones” in the body of the article, it showed up on this website.

The third and fourth examples belong to the same website. This website was smart enough to take out some of my back-linking in some of their articles but they left in the H2 and H3 tags and various other formatting from the original article. The title of the article, however, is virtual the same. They used a spinner to change certain words in the article automatically but it is a 99% match to my original article.

The fifth example scraps everything, verbatim, from my site. Unabashedly so, internal links, external links, everything. It’s ridiculous. No shame.

How did I find out I was being Scraped?

The physical morons scraping one of my websites linked back to me FROM THE ARTICLES THEY STOLE. Idiots.

What I did about the Scrapers and Intellectual Property Thieves

The first two websites, comprising the first four examples, are no longer online. I had nothing to do with it. Some of their social media, however, can still be viewed. The third website is still online and is consistently plagiarizing my work and the work of my writers (I know. Don’t ask.).

Stop Plagiarism

How to Protect Yourself and your Website from Article Scrapers and Plagiarism

1. Use the Yoast SEO WordPress plugin function that adds the original article URL and a back-link to your website to the footer of your articles in your RSS feed so that if someone completely copies your article, you get credit and a back-link to your site (a no-follow back-link). When you first install the Yoast SEO plugin on your WordPress website, spoken of here and here, one of the features already activated under “Search Appearance,” then “RSS feed settings,” then “Content to put after each post in the feed” is:

The post %%POSTLINK%% appeared first on %%BLOGLINK%%.

By leaving this as is, this places your original article URL and a link to your home page at the bottom of all of your articles in your RSS feed. Most scrapers scrap from a site’s RSS Feed so when your site is scraped, the person reading the plagiarized article will know where the original article came from and who first published it. You’ll get credit (and possibly a click-back) from the scraped work.

The Yoast SEO WordPress plugin is free to download, install, and use.

2. Let site visitors know in the footer of your website, and on a separate page, that the content of your site is copyrighted and permission needs to be granted in writing in order for the content of your website to be replicated. You can also just forbid reproduction of your work entirely in the footer of your site and on a separate linked page. Show the date of your copyright as well (the current year). That will let the reader know that your copyright is valid, active, and up-to-date.

3. Form an LLC, a limited-liability corporation. It will cost you a couple hundred dollars to set up and a hundred or so a year to maintain (paper work, etc.) but it is worth it tax-wise and legal protection-wise. Regarding copyrights,

The LLC itself would own the copyright in the articles, posts, and other content created by its employees (if any) in the course of their jobs.

That means that if you have writers and you decide to sue a copy-right breaker (a plagiarizer), the LLC speaks for you (and your writers) in that lawsuit, one voice. Your name and your personal information may never see the light of day.

After the LLC if formed, a scraper will no longer be stealing an individual’s intellectual property in the eyes of the law, they will be stealing a company’s intellectual property, a company with potentially deep litigious pockets. Once you display your LLC on your website (footer, About page, etc.), the scraper may think twice before scraping your website’s content.

4. Send the scraper website a cease and desist email. List the scrapped articles in the email. Take pictures of them (like I did). Take videos of them. Place some of those in the email as well. If the scraper website doesn’t desist and if they don’t remove your stolen intellectual property from their site by the deadline in your email, Step 5.

5. Take your pictures, videos, sit down with a Intellectual Property attorney in your state, and work out the best course of action. It may cost you a little money upfront but you will be given your best legal options.

In Closing

Scrapers are going to scrape and spinners are going to spin. Protect yourself.

Leave your thoughts on this article below in the comments section. Want up-to-the-minute notification of newly published articles? ProMovieBlogger publishes articles by Email, Twitter, Facebook, Pinterest, Reddit, and Tumblr.

About the author

Rollo Tomasi

Rollo Tomasi is a Connecticut-based film critic, TV show critic, news, and editorial writer. He will have a MFA in Creative Writing from Columbia University in 2025. Rollo has written over 700 film, TV show, short film, Blu-ray, and 4K-Ultra reviews. His reviews are published in IMDb's External Reviews and in Google News. Previously you could find his work at Empire Movies, Blogcritics, and AltFilmGuide. Now you can find his work at FilmBook, ProMovieBlogger, and TrendingAwards.

Connect with ProMovieBlogger

Advertisement

Share via

You cannot copy content of this page

Send this to a friend