An Introduction to Data Science SEO

Antonio Blago
Antonio Blago

An Introduction to Data Science SEO   Minutes of reading time remaining By Antonio Blago February 19, 2024

Data Science, SEO

Since the beginning of 2023, I have been working on Data Science SEO and programmatic solutions with Python, as SaaS tools like Ahrefs and SEMrush cost hundreds of dollars per month. I was pleasantly surprised by what is possible with Python and web crawling. With Data Science for SEO, you save a lot of money and time conducting SEO analyses and can take your SEO to the next level. For this tutorial, basic knowledge of Python is required. Everything I have learned so far is reflected here in the tutorial. Let's get started!

Data Science SEO Consulting

How do you get started with Data Science SEO and what do you need to bring? Feel free to book a consulting session.

Book a consultation

A Brief Introduction to SEO

The main focus of SEO lies in two key areas:

A Brief Introduction to Data Science SEO with Python

Why Python?

Setup

Set up your development environment

Set up your Google Custom Search API

Python Setup

Guide to Data Science SEO

(1) SERP Analysis

(2) Google SERP Rank Tracking

1. Here is a basic structure of how you could proceed:

2. You schedule the data extraction every 10 sec.

3. You visualize the data

(3) Keyword Coverage

Further Articles on Data Science SEO

The original article was published by me on Medium on May 9, 2023, in English: https://python.plainenglish.io/a-full-guide-on-programmatic-seo-in-python-for-2023-e2b82d227383

SEO (Search Engine Optimization) is the process of optimizing a website for Google search with the goal of achieving higher web traffic and improving the website's visibility. The main focus of SEO is on two key areas:

  1. On-Page Factors: These are factors controlled by the website owner and include things like page titles, meta tags, and keyword density.
  2. Off-Page Factors: These are factors not controlled by the website owner but play a role in how Google views the website, such as inbound links and social media signals.

An important point is that SEO is an ongoing process, not a one-time intervention. To maintain high rankings, it is important to regularly monitor your website and ensure that you are taking steps to improve your on-page and off-page SEO signals.

Read More about the basics of SEO here.

Python (a well-known programming language) offers a range of modules and libraries that can help with SEO tasks such as web scraping, log file analysis, and sitemap creation. In this guide, I will introduce some of the most popular tools and show you how you can use them for your SEO.

Data Science SEO is the process of using code to automate a variety of tasks related to analysis in search engine optimization. These tasks can include:

  • Crawling websites for data
  • Analyzing this data
  • Creating custom reports

By automating these tasks, SEOs can quickly and efficiently optimize their websites for better search engine rankings. Additionally, Data Science SEO can help you identify areas for improving your website's ranking and make informed decisions regarding your future optimizations.

Python is a powerful programming language that allows developers to quickly and efficiently create sophisticated applications. Python is also an excellent choice for automated SEO tasks due to its easily readable syntax and extensive standard library.

I will cover the basics of Data Science SEO in Python, including methods for crawling websites for data, processing and analyzing this data, and finally, how to generate custom reports. By the end of this guide, you will have a solid understanding of how to use Python for your SEO tasks.

In the following steps, I will explain what is important for the setup. Please let me know in the comments if you have any difficulties.

Set Up Your Development Environment

PyCharm is a suitable solution for Python. But there are also many other good options. Just check out this guide where I explain how to properly set up your IDE in PyCharm.

I am using Python, version 3.10, on a Windows machine (11).

Set Up Your Google Custom Search API

This is a programmatic search engine API from Google. To connect the Google Custom Search API and retrieve Google search results with Python, you need to do three things:

  • Set up a Google account if you don't already have one.
  • Create an account in the Google Developer Console: [Google Developer Account]
  • Create an API key for the Custom Search API: [Create a key here.]
  • Custom Search Engine ID: Copy the code after "cx".

API key for the Custom Search API
Custom Search Engine ID

Python Setup

Let's get started. Install the advertools library. Here is the link to the documentation: https://advertools.readthedocs.io/en/master/

pip install advertools
# OR:
pip3 install advertools

Also, create an environment file for your keys, which makes the process safer and easier.

.env
google_api_key = <Custom Search API Key>
google_cse_id  = <Custom Search Engine ID>

Let's start working in the field of Data Science SEO.

(1) SERP Analysis

SERP stands for Search Engine Results Page. By reviewing the top-ranked websites for a specific keyword or topic, you can assess your chances of outperforming the competition. SERP analysis can also reveal how a competitor achieved a particular position on the results page.

Researching keywords and identifying search intent is one way. Then you can analyze competitors and find ranking opportunities. Finally, you can optimize your content.

The market is also full of various tools that make it easier to automate the manual process.

Save $100 with this script!

Create a script named v. Place it in the same folder as your .env file. I am searching for the top 100 positions on Google for dentists. You will get about 99 columns of data in 100 rows. You only need a few of them. Save these in an Excel file.

You will see the following:

  • Current position on Google, column D
  • The title of the page (meta title), column E
  • A snippet of the first words, in column F
  • The short displayed link
  • A formatted link

Great, now you know who is ranking first on Google. But not why. This will become clear in the following analysis.

(2) Google SERP Rank Tracking

You can also create a SERP tracker and ranking tracker with Python. You can run this script in the background and visualize it in a Flask app. I explain how to embed a Plotly visualization in Flask here.

You can also run it online and allow your clients to access it. You can deploy it via an online platform (click here).

Below you see the output of a SERP tracker as an animation. In this short period, the ranking for the keyword "dentist" did not change, but over longer periods, it would show differences.

To create this animation, you can define a function that tracks the SERP (Search Engine Results Page). This function pulls the results and stores them either in a pickle file or a database.

  1. Here is a basic structure of how you could proceed: You create a function that regularly pulls data from the Custom Search API and stores it in a pickle file.
  2. You schedule the data extraction every 10 seconds.
  3. You visualize the data.

For this task, you should first install Plotly.

pip install plotly

(3) Keyword Coverage

Now we want to find out how well this ranking performs with similar keywords. Go to https://answerthepublic.com/ and search for "Zahnarzt". You will receive a variety of similar keywords. 726 results, see image below.

To copy a list of keywords and use them in a loop to check the keyword coverage of the top 10 positions for these keywords and save the results in an Excel file, you can proceed as follows:

First, save the keywords in a list:

Then put the script together:

The result in the Excel file:

We see that the first keyword covers 19% of all keywords and is on average ranked 2.6. If we perform this analysis on a larger scale, for example for these 726 keywords, we get a much clearer picture.

Great! That's all for now.

The following topics would also be possible:

  • Cluster analysis
  • Identification of content gaps
  • Technical SEO

If you are interested in such topics, please comment below.

Further articles on Data Science SEO

  • Shopify relaunch: what to consider and how Data Science SEO helps.
  • AI prompt analysis for keywords and brand mentions
  • Cluster and prioritize your keywords with automated search volume analyses
  • Status checker workflow with n8n and Apify: A step-by-step guide

E-commerce SEO Case Study for PURELEI

E-commerce SEO Case Study: PURELEI.com +100% visibility in 8 months

In October 2024, it started

Read more

Case study, SEO, strategies

E-commerce SEO Case Study for PURELEI

E-commerce SEO Case Study: PURELEI.com +100% visibility in 8 months

In October 2024, it started [...]

Case study, SEO

Case Study: How we generated over €1.2 million in revenue with targeted blog content

Brief overview

A fast-growing brand was able to generate over €1.2 million in revenue within 12 months through targeted content strategies and [...]

Automation, AI, SEO

Finally Accessible: Efficiently Fill ALT Texts with AI

5 (1) Practical example in Shopify

In my job as an SEO freelancer [...]

Analysis, AI, SEO, SEO Tools

AI Prompt Keyword Mapper

0 (0) How to automatically analyze prompts

Nowadays, prompts – that is, [...]

SEO, Shopify

Part 2: Automate Shopify Redirects: Connect Python Sitemaps, Excel & Matrixify

0 (0) 📖 Part of a series: Why missing redirects cost you traffic [...]

SEO, Shopify

Part 3: Plan Shopify Redirects Internationally: Combine Subdomains, Slugs & Sitemap Fallbacks Correctly

0 (0) 📖 Part of a series: Why missing redirects cost you traffic [...]

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Use my SEO roadmap to get to page 1 on Google!

Sign up for my newsletter and get access to free guides, checklists, and tools.

 
Cookie-Settings