Alternative to the Keyword Planner: Retrieve 130k Keywords
Cheaper Alternative to the Keyword Planner: Retrieve 130k Keywords with Python Minutes of reading time remaining By Antonio Blago March 23, 2025
SEO, Tutorial
The Google Keyword Planner is handy – but with large keyword lists, automation, or API usage, you quickly hit its limits. If you want to analyze 10,000 or even 130,000 keywords, you need a scalable, affordable, and flexible solution.
This is exactly where DataForSEO comes into play – combined with Python, you can build your own keyword tool. Check out the video here:
In this tutorial, I'll show you step by step how to do this – including batching, cost control, and Excel export.
Why an Alternative to the Keyword Planner?
The Google Keyword Planner is solid, but:
- heavily limited in terms of keyword volume
- not very flexible for automated workflows
- API access only with an active Google Ads account
So if you want to process large keyword lists (e.g., from BigQuery, SEO tools, or your own data sources), you need a scalable solution – like DataForSEO + Python.
Prerequisites
Install Python Packages
pip install pandas requests openpyxl
DataForSEO Access
Register at DataForSEO and get your credentials (login & password). This allows you to use their powerful API, which is specifically designed for large queries – the perfect Keyword Planner alternative for developers and SEOs.
API Access
Register at DataForSEO and get:
- Login
- Password
You need these for authentication via Basic Auth.
Setup Logic
- Load keywords from Excel file
- Send to the API in batches of 1,000
- Retrieve results including CPC, search volume, and monthly trends
- Save progress, API costs, and results (Pickle + Excel export)
- Seamlessly resume after interruption
Important Features in the Code
Batching:
Keywords are sent to the API in groups of 1,000 to comply with limits.
Pickle & Progress Tracking:
If the script crashes, you can seamlessly continue – including costs, progress, and results.
API Cost Overview:
After each batch, costs are logged – and at the end exported as a table to an Excel file.
Keyword Filtering:
- Max. 80 characters
- Max. 10 words
- Cleansing of characters like ™, ©, ®, etc.
Recommended Folder Structure
projekt/ │ ├── keywords.xlsx ← Your keyword list ├── get_sv_data4seo.py ← Main script ├── backup_results.pkl ← Automatic interim storage ├── final_results.xlsx ← Exported result (3 sheets) ├── invalid_keywords.txt ← Filtered / unusable keywords └── progress.txt (optional) ← Optional progress tracking
Advantages Over the Keyword Planner
| Google Keyword Planner | DataForSEO + Python (this tutorial) |
|---|---|
| Limit for bulk queries | Any number of batches possible |
| Inflexible, web interface | Fully automated via code |
| Hardly any clustering or export | Custom analyses & filters possible |
| API only with Google Ads | API directly & transparently usable |
👉 So you get a real alternative to the Keyword Planner if you need deeper, faster, or more extensive keyword data.
Script:
There are functions in the script (such as the advanced cleaning method for keywords) that I cannot publish; if you want access or need help with setup, book a support appointment here.
import os
import time
import json
import base64
import pandas as pd
import requests
import numpy as np
from dotenv import load_dotenv
load_dotenv()
## import word_frequency as wf advanced cleaning
import re
import unicodedata
# --- Konfiguration ---
PICKLE_BACKUP = "backup_results.pkl"
PROGRESS_FILE = "progress.txt"
BATCH_SIZE = 1000
LOCATION_CODE = 2276
LANGUAGE_CODE = "de"
MAX_RETRIES = 3
RETRY_DELAY = 3 # Sekunden
USERNAME = "login"
PASSWORD = "password"
AUTH_HEADER = os.getenv("d4seo") #base64.b64encode(f"{USERNAME}:{PASSWORD}".encode()).decode()
API_URL = "https://api.dataforseo.com/v3/keywords_data/google_ads/search_volume/live"
def return_path():
EXCEL_FILE = "keywords.xslx" # muss eine Spalte "keyword" enthalten
return EXCEL_FILE
def clean_keywords(keywords):
cleaned = keywords ## wf.basic_cleaning(keywords)
return cleaned
FORBIDDEN_CHARS = set("™©®✓•→←≠∞¿¡§¶…")
def is_valid_keyword(kw: str) -> bool:
try:
kw = kw.strip()
if not kw:
return False
# Entferne verbotene Zeichen
for char in FORBIDDEN_CHARS:
kw = kw.replace(char, "")
kw_norm = unicodedata.normalize("NFKC", kw)
# Entferne Steuerzeichen
kw_cleaned = ''.join(c for c in kw_norm if unicodedata.category(c)[0] != 'C')
# Prüfe verdächtige Zeichen mit Unicode-Name
for char in kw_cleaned:
try:
name = unicodedata.name(char)
if not any(part in name for part in ["LATIN", "DIGIT", "SPACE", "DASH", "HYPHEN"]):
return False
except ValueError:
return False
# Filter kaputter Kodierungen
if re.search(r"[ãåÓ€]", kw_cleaned):
return False
# Zeichenverhältnis
valid_chars = re.findall(r"[a-zA-Z0-9äöüßéèêáàâíìîóòôúùûčšžăîâăëç\- ]", kw_cleaned.lower())
ratio = len(valid_chars) / max(len(kw_cleaned), 1)
if ratio < 0.7:
return False
# Mindestens 3 Buchstaben
if len(re.findall(r"[a-zA-Z]", kw_cleaned)) < 3:
return False
return True
except Exception:
return False
# ----------------- Hilfsfunktionen -----------------
def load_keywords_from_excel(filepath: str) -> list:
df = pd.read_excel(filepath)
return df["keyword"].dropna().astype(str).tolist()
def batch_keywords(keywords: list, size: int) -> list:
return [keywords[i:i + size] for i in range(0, len(keywords), size)]
def send_batch_request(keywords_batch: list) -> dict:
payload = [{
"keywords": keywords_batch,
"location_code": LOCATION_CODE,
"language_code": LANGUAGE_CODE,
"sort_by": "relevance"
}]
headers = {
'Authorization': f'Basic {AUTH_HEADER}',
'Content-Type': 'application/json'
}
response = requests.post(API_URL, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
return response.json()
else:
raise Exception(f"HTTP {response.status_code}: {response.text}")
def parse_response(response_json: dict) -> tuple[pd.DataFrame, pd.DataFrame]:
results = []
monthly_results = []
error_keywords = []
tasks = response_json.get("tasks", [])
for task in tasks:
items = task.get("result", [])
for item in items:
try:
keyword = item.get("keyword")
results.append({
"keyword": keyword,
"search_volume": item.get("search_volume"),
"competition": item.get("competition"),
"cpc": item.get("cpc"),
"currency": item.get("currency")
})
monthly_searches = item.get("monthly_searches")
if isinstance(monthly_searches, list):
for ms in monthly_searches:
monthly_results.append({
"keyword": keyword,
"year": ms.get("year"),
"month": ms.get("month"),
"search_volume": ms.get("search_volume")
})
except Exception as e:
error_keywords.append((item.get("keyword"), str(e)))
print(f"Fehler beim Verarbeiten von Keyword '{item.get('keyword')}': {e}")
return pd.DataFrame(results), pd.DataFrame(monthly_results)
def save_progress(index: int):
with open(PROGRESS_FILE, "w") as f:
f.write(str(index))
def load_progress() -> int:
if os.path.exists(PROGRESS_FILE):
with open(PROGRESS_FILE, "r") as f:
return int(f.read())
return 0
def load_existing_results() -> tuple[pd.DataFrame, pd.DataFrame]:
if os.path.exists(PICKLE_BACKUP):
data = pd.read_pickle(PICKLE_BACKUP)
return (
data.get("results", pd.DataFrame()),
data.get("monthly", pd.DataFrame()),
data.get("api_costs", pd.DataFrame()),
data.get("status", {}).get("progress_index", 0),
data.get("status", {}).get("total_cost", 0.0)
)
else:
return pd.DataFrame(), pd.DataFrame(), pd.DataFrame(), 0, 0.0
def save_all_to_pickle(results_df, monthly_df, api_costs_df, progress_index, total_api_cost):
data = {
"results": results_df,
"monthly": monthly_df,
"api_costs": api_costs_df,
"status": {
"progress_index": progress_index,
"total_cost": total_api_cost
}
}
pd.to_pickle(data, PICKLE_BACKUP)
# ----------------- Filter Keywords -----------------'''
# dataforseo: max. 80 characters
# max. 10 words
# Conversion to lowercase
# :param keywords:
# :param log_file:
# :return:
# '''
def filter_keywords(keywords: list[str], log_file: str = "invalid_keywords.txt") -> list[str]:
filtered = []
invalid = []
for kw in keywords:
kw_clean = kw.strip().lower()
if (
len(kw_clean) <= 80
and len(kw_clean.split()) <= 10
and is_valid_keyword(kw_clean)
):
filtered.append(kw_clean)
else:
invalid.append(kw.strip())
if invalid:
with open(log_file, "w", encoding="utf-8") as f:
f.write("Invalid keywords (too long, too many words or invalid characters):\n\n")
for kw in invalid:
f.write(f"{kw}\n")
print(f"{len(invalid)} invalid keywords were ignored and saved in '{log_file}'.")
return filtered
def clean_keyword_simple(kw: str) -> str:
"""
Specifically removes protected characters such as ™, ©, ® etc. from the keyword.
Returns the cleaned keyword.
"""
FORBIDDEN_CHARS = set("™©®✓•→←≠∞¿¡§¶…")
kw_cleaned = kw.strip()
for char in FORBIDDEN_CHARS:
kw_cleaned = kw_cleaned.replace(char, "")
return kw_cleaned
def main():
print("Loading keywords from Excel file...")
excel_file = return_path()
all_keywords = load_keywords_from_excel(excel_file)
all_keywords = [clean_keyword_simple(kw) for kw in all_keywords] ## clean keywords from trademark and copyright
all_keywords = filter_keywords(all_keywords) ## filter too long keywords
keyword_batches = batch_keywords(all_keywords, BATCH_SIZE)
start_index = load_progress()
results_df, monthly_df, api_costs_df, start_index, total_api_cost = load_existing_results()
#monthly_df = pd.DataFrame()
total_api_cost = 0.0
api_costs_list = []
print(f"{len(all_keywords)} keywords loaded. Starting from batch {start_index + 1} of {len(keyword_batches)}.")
for idx in range(start_index, len(keyword_batches)):
batch = keyword_batches[idx]
print(f"\nProcessing batch {idx + 1}/{len(keyword_batches)} ({len(batch)} keywords)...")
success = False
for attempt in range(1, MAX_RETRIES + 1):
try:
## Clean keywords from apostrophe
cleaned_batch = clean_keywords(batch)
response_json = send_batch_request(cleaned_batch)
time.sleep(1)
# Extract API costs
batch_cost = response_json.get("cost", 0.0)
total_api_cost += batch_cost
api_costs_list.append({
"batch_index": idx + 1,
"keyword_count": len(batch),
"cost_usd": batch_cost
})
print(f"API cost for this batch: {batch_cost:.3f} USD")
df_keywords, df_monthly = parse_response(response_json)
results_df = pd.concat([results_df, df_keywords], ignore_index=True)
monthly_df = pd.concat([monthly_df, df_monthly], ignore_index=True)
save_all_to_pickle(results_df, monthly_df, api_costs_df, idx + 1, total_api_cost)
print(f"Progress saved – batch {idx + 1}, total cost: {total_api_cost:.3f} USD")
save_progress(idx + 1)
success = True
break
except Exception as e:
print(f"Error in batch {idx + 1} (attempt {attempt}): {e}")
if attempt < MAX_RETRIES:
print(f"Waiting {RETRY_DELAY} seconds before retrying...")
time.sleep(RETRY_DELAY)
else:
print("Maximum number of attempts reached. Script will be aborted.")
return
if success:
time.sleep(1.5) # API rate limiting
print(f"\nTotal API cost: {total_api_cost:.3f} USD")
# API cost table
api_costs_df = pd.DataFrame(api_costs_list)
api_costs_df.loc[len(api_costs_df.index)] = {
"batch_index": "Total",
"keyword_count": api_costs_df["keyword_count"].sum(),
"cost_usd": total_api_cost
}
with pd.ExcelWriter("final_results.xlsx") as writer: results_df.to_excel(writer, sheet_name="Keyword-Daten", index=False) monthly_df.to_excel(writer, sheet_name="Monatliche Suche", index=False) api_costs_df.to_excel(writer, sheet_name="API-Kosten", index=False)
print("\nExport completed: final_results.xlsx")
if name == "main": main()
Exported Result
At the end of this tutorial, you will have an Excel file with:
Keyword overview: search volume, competition, CPC
Time series: monthly volume development
API cost overview per batch
You can use this, for example, to create clusters, analyze seasonal trends, or build your own keyword tool.
Conclusion
This setup is ideal for:
data-driven SEO analyses
performance marketing with large keyword sets
reporting, forecasting & clustering
If you are looking for an alternative to the Keyword Planner that you can fully control and scale – then Python + DataForSEO is exactly what you need.
More related articles:
Tutorial + Template: Backlink Analysis with DataForSEO API Key
Set up Google Search Console
Data Science for SEO
E-commerce SEO Case Study for PURELEI
E-commerce SEO Case Study: PURELEI.com +100% visibility in 8 months
Case study, SEO, strategies
E-commerce SEO Case Study for PURELEI
E-commerce SEO Case Study: PURELEI.com +100% visibility in 8 months [...]
Case study, SEO
Case Study: How we generated over €1.2 million in revenue with targeted blog content
Brief overview A fast-growing brand was able to generate over €1.2 million in revenue within 12 months through targeted content strategies and [...]
Automation, AI, SEO
Finally Accessible: Efficiently Fill ALT Texts with AI
5 (1) Practical example in Shopify In my job as an SEO freelancer [...]
Automation
Create Apify Account
Tutorial: Create & start Apify account Step 1: Go to apify.com Open [...]
Analysis, AI, SEO, SEO Tools
AI Prompt Keyword Mapper
0 (0) How to automatically analyze prompts Nowadays, prompts – that is, [...]
AI, AI Tools, SEO Tools
ChatGPT German: Use Chat GPT for free without registration
[borlabs-cookie id="aichatbot" type="content-blocker"][/borlabs-cookie] ChatGPT, the advanced language model from OpenAI, is revolutionizing the way [...]
Use my SEO roadmap to get to page 1 on Google!
Sign up for my newsletter and get access to free guides, checklists, and tools.