Niche Topic Research with Reddit, N-Grams, and Named Entity Recognition (NER)

Written by Constantin Oesterling on 2024-06-17

Reddit is an amazing resource for finding trending topics in any niche that people are talking about.

Learn how to scrape data from subreddits and extract topics to use in your SEO content strategy.

Why Use Reddit for Niche Topic Research?

By using Reddit for topic research, you can quickly identify trending topics in any niche that your competitors are missing and keyword research tools haven't yet picked up.

The key benefits are:

  • Find new and trending topics in any niche
  • Find out which topics your target audience discusses most frequently
  • Fill gaps in your topical coverage that your competition doesn't even know about

How to Use Reddit for Niche Topic Research?

You can use Reddit for niche topic research by scraping post titles and then extracting n-grams and named entities from them.

1. Get the Tools Ready

  1. Instant Data Scraper: install this Chrome extension to scrape data from Reddit
  2. Infinitnet N-Gram Analyzer: use this free tool to analyze n-grams
  3. TextRazor Demo: use this free tool to extract named entities

2. Scrape a Subreddit

  1. Find and open a subreddit about your niche
  2. Sort posts by "Hot", "New", "Top", or "Rising" depending on your research goals
  3. Open the Instant Data Scraper extension, ensure the post titles are part of the data to scrape, enable "Infinite scroll", set "Min delay" to 1 second and "Max delay" to 3 seconds and then click "Start crawling"
  4. Wait until 1k or more rows of data have been scraped (the more the better - usually at least)
  5. Export scraped data to CSV or XLSX

3. Extract Named Entities and N-Grams

  1. Paste all scraped post titles into the Infinitnet N-Gram Analyzer, check "Ignore stopwords" and "Use lemmatization" and click "Analyze"
  2. Paste all scraped post titles into the TextRazer demo and click "Analyze"

How to Use the Extracted N-Grams and Named Entities for SEO?

The extracted n-grams show frequently discussed topics that may or may not have a named entity associated with them, while the TextRazor analysis only shows frequently mentioned named entities that have a dedicated Wikipedia page.

The reason you need both is that while it's easier to analyze just the named entities, you'll miss more unique niche topics that don't have a dedicated Wikipedia page if you don't analyze the n-grams as well.

If you run a website in the same niche as the subreddit you scraped, the extracted data will give you a good idea of the topics your target audience is talking about, ranked by how often a topic is mentioned in a post title.

You could extend this strategy and also scrape the content of all posts or even the comments to get more data.

But in most cases, the post titles already give you a pretty good dataset to work with.

Your task now is to identify the topics that are a) actually relevant to the niche and your site, and b) that you haven't covered on your site yet.

These topics should be part of your topical map and broader content strategy.

Let's stay in touch
⬇️