developer sitting in front of a computer with back to the camera

Web Scraping and How it Affects Your Digital Marketing Campaigns

October 22, 2020

You have done it. You have had your writers put together a series of killer blogs and articles that perfectly reach your target audience. You have made sure they are informative and unique, interesting while still helping to sell your product, perfect SEO, with the keywords you labored over for hours researching appearing enough to feature highly on search engines without feeling clunky and forced.

You finally start publishing and go live and… nothing. You had a flurry of interest at first, but then when you check by typing a mock search into google a couple of days later you can’t even find your content, never mind about having it show up on the first page. You are at a loss, scratching your head, surely you didn’t get everything this wrong? Surely you are better at your job than this?

Well, good news for you, you probably didn’t get anything wrong, and in fact you are probably so good at your job you have become the victim of web scraping.

The bad news is that this can be devastating for any digital marketing campaign, and if you don’t deal with it fast you could actually end up with less traffic and a lower conversion rate then before the campaign started.

Get started with a free trial today to see exactly how much you could be losing  to ad fraud.

What Actually is Web Scraping?

Content web scraping is a process whereby fraudsters use bots to “scrape” (steal) your high quality, keyword-rich content and publish it on their own websites, and it is a growing problem. A quick search on google will show even the most computer illiterate thief how to steal your intellectual products which means that no longer is web scraping reserved for those capable of programming malicious bots themselves, but a crime accessible to all. 

After running one of these scripts on your website the fraudster will have access to all of the content you spent untold hours putting together to do with as they please. This does not just include the words on the page either. A sophisticated scraper bot will also acquire any images, formatting, and links included on the page.

Once they have all this information, they generally use it in one of two ways. Either they repurpose and edit it slightly to post on their genuine website in order to steal your traffic, or more commonly post it many times as an exact copy on a bunch of trash websites that are there solely to duplicate people’s content.

Download this e-book to better understand bots, how they affect your digital  marketing campaign, and how to minimize your company’s risk with an ad fraud  solution. >>Why is it So Bad?

It is fairly self-explanatory why the first use for your stolen data is bad for you. There are only so many people looking up each topic, and so if someone stealing your content takes a chunk of them, that is less available to you. This translates to less clicks, less leads and ultimately less conversions. In this way content thieves directly take money out of your pocket, and harm your campaign.

web-scraping-attackSource: Imperva

The second way is a little more complicated however, and to understand requires a bit of basic understanding on how search engines work, and what Search Engine Optimization (SEO) actually is. The internet has a vast amount of information on it, and search engines attempt to organize all of this information and categorize it using programs called web crawlers.

These crawlers are constantly visiting every page on the internet and indexing them by many factors such as content, speed, and usability. Ultimately the goal of all of this for search engines is that when people use them to search for a particular word or phrase they are able to provide the most useful pages for that person, allowing them to achieve their goal as quickly as possible, and thus keep using the search engine.

Search Engine Optimization (SEO) then is the process of optimizing a given website or page to make it as easy as possible for these crawlers to deem its content ‘useful’ given the relevant searches. The most obvious way this happens, and what you are no doubt aware of, is by ensuring certain keywords related to a particular topic are included in the content of said page.

There are dozens or other ways search engines rank results however, and an integral one of these is uniqueness. Because they don’t want to simply give a user page after page of repeated information, crawlers will filter out duplicate content. “Great!” You may be thinking, that means that they will disregard all of those nasty duplicates and leave my page to its rightful place in their results, but unfortunately, it is not that simple.

Crawlers tend to struggle to identify which content is the original and which is the stolen, and often cannot tell the difference at all, or even worse think that you are the stolen content and one of their pages is the original and punish you. Most often though they cannot tell the difference, and so punish all pages equally, and send all of them tumbling down the results list into obscurity. This then allows your competitors to by default appear higher up to people, taking away business and damaging your brand.

While both of these can do irreparable damage to your campaign, what is even more insulting, and destructive, is those using web scraping in order to do both attacks simultaneously. By altering your content just enough to maintain uniqueness before making your blogs seem copied, fraudulent companies can both eliminate their competition and take advantage of all of your research and work at the same time. This can be truly devastating for any but the most firmly established companies.

What Can I Do About It?

Luckily, you and your campaign are not powerless against these scraper bots, and there are legitimate and effective ways to fight back. Many of these techniques you can do yourself, with some of the methods listed in this article you can genuinely limit the amount and ease with which people can scrape your content and improve your defences.

For example, by adding CAPTCHAs you can make it much more difficult for bots to copy your website as all but the most sophisticated ones are unable to solve them. Unfortunately, however, tools like these, while effective at stopping the fraud, will also likely lose you valuable leads as genuine people choose another website over these laborious tasks. Meanwhile other solutions offered here, such as embedding your content with internal links, may act as a small damage reduction but ultimately will not protect your marketing campaign. 

As such, the only complete solution is to get a professional ad fraud solution. Here at Anura, this is what we specialize in. With pattern and behavioral recognition, we are able to identify malicious bots far more accurately than even the most sophisticated CAPTCHAs, and are able to do it without any of the timely customer interactions that can cost you conversions. This allows us to intercept web scraping scripts before they are even able to copy your content, meaning you can sit back and watch your marketing work its magic.

Anura does not use confusing numerical scoring systems which create false positives. Just a black-and-white indicator of web traffic as real or fraudulent. When Anura says it’s bad, it’s bad.

Don’t let yourself become a victim of fraud, request a trial today

bots 101 ebook cta