To do SEO, you need to analyze competitors’ websites in batches, and to do foreign trade, you also need to find URLs in batches based on a certain keyword. Let me share a method that allows you to quickly crawl all the URLs on the Google search results page.
Share ideas first:
- Adjust Google to 100 search results per page;
- Make a “filter”;
- Use Google plugin + filter to get URLs in batches;
- Deduplication in Excel;
After getting the URL, whether you use hunter to obtain the addresses of websites in batches, or use ahrefs to analyze website traffic in batches to find the right webmaster, it’s up to you.
The important thing is that we saved a lot of time with the above four steps, the following text:
Adjust Google to 100 search results per page
The first step is to adjust Google to 100 search results per page, which is very simple, as shown in the figure, click on the Google search page, and select “Settings→Search Settings” in the lower right corner.
Then on the settings page, set the number of searches per page to “100”.
As the article is written here, I am thinking that since Google can directly display 100 pages, is there a way for Google to display 1,000 pages or 10,000 pages?
I searched with “can google display more than 100 result per page?” as a keyword, and found that there is no mention of how to break through the 100 upper limit. It is estimated that Google’s query can only return 100 results at most.
Regardless of this question for the time being, if you know how to get more than 100 result pages at one time, please private message me and send you a small gift (I guess there should be, just splicing the query results…It is easy to do in Python, but this tutorial It is written for Xiaobai, so I won’t dive too deep).
Make a “filter”
Then enter the keyword you want to check, such as WordPress Theme, right click on any search result in Google Chrome, and click “Check” (yes, we need to see its source code):
Then a source code box will appear at the bottom. Observe that when the mouse is placed on the corresponding code block, the text represented by this code block will be highlighted on the browser:
What we want is the URL, so look up and down, especially the small drop-down arrow of the triangle, which can also be expanded and collapsed, and traversed one by one, and finally we will find a string of code blocks representing the URL (even if you are not familiar with the code, use The most stupid method, you can definitely find it in 5 minutes):
Then right click on this string of code and select “copy→copy xpath”:
At this point you should end up with something similar to the following code (don’t panic, you don’t need to understand it, just do it):
//*[@id="rso"]/div[1]/div/div[1]/a/div/cite/text()
This string of code is actually a filter. Later we will use the Google plug-in to batch grab the source code of the search result page. If there is this filter to filter, we will finally get a URL. In our case, it is wordpress.org.
But what we want is to filter out 100 URLs at once, so we need to transform this filter, delete all the numbers and the corresponding brackets in it, and get the following filter:
//*[@id="rso"]/div/div/div/a/div/cite/text()
This filter is a featureless filter that can match all eligible results at once instead of matching only one result.
Use Google plugin + filter to get URLs in batches
click on this linkYou can download the plug-in “Scraper” we need. After downloading, right-click the blank space on the search result page, and you will see a “scrape similiar” appear in the menu bar.
After clicking, it will automatically enter the following interface, which means that we have successfully captured the source code on the webpage:
After filling in the filter we prepared in the previous step in the upper left corner, click “scrape” in the lower left corner, and finally click “copy to clipboard” in the lower right corner to copy all the crawled URLs to the clipboard.
Note that these URLs are very likely to be repeated. For example, when searching for B2B words, alibaba.com and amazon.com are likely to be repeated dozens of times. At this time, the deduplication function of Excel must be used.
Deduplication in Excel
Open Excel and paste the results we got in the previous step:
In order to get all the search results at once, we can repeat the fourth step to collect all the URLs of the search pages:
Google will hide pages that it considers to be highly repetitive. In order not to slip through the net, we can enter the last page, where there is an option to “re-search to show omitted results”.
After clicking, more pages will appear. In my example, “WordPress theme” has nearly 500 pages, which is enough for our analysis:
After crawling all the URLs, we will get a very long Excel list, select all the data, and click Excel’s “Table→Delete Duplicates”:
Then you can get a string of clean URLs:
What to do with the data after getting it?
I usually do marketing work and need to maintain communication with many webmasters, but not every website is worth my time to do research and socialize with the webmasters behind it, so I usually copy it to the paid SEO tool ahrefs, Analyze the keywords and traffic of these websites in batches.
Ahrefs’ “Batch analysis” is very easy to use, which is one of the reasons why I always recommend it.
After the analysis, you will get a result similar to the following. Click “DR” to rank according to the authority of the domain name. A website with a higher original name weight indicates many problems. It may be that the website has existed for a long time, or it may be the influence of the website. It’s very big. All in all, the higher the ranking of the website, the more worthy of making friends with the webmaster behind it.
Then you can sort out the webmasters behind all the websites from high to low according to this sort.
Extra meal: how to find email address based on URL?
Although it is a bit tricky to talk about this problem in front of foreign traders, but I usually use this toolIt’s very useful to find the mailbox. For other methods, please refer to the course of the god of materialism, he took this topic to the extreme.
If I can’t find the mailbox of a website, I will not waste too much time looking for the mailbox. There are a lot of good websites, one more is not too many, and one less is a lot.
Unless this website is really good, attracts me in all aspects, and I will try my best to make friends with the webmaster behind it, otherwise I really don’t want to devote too much energy to a resource (which can make me so crazy sites, few in my entire marketing career).
That’s it! Sometimes you don’t need to be too brainy when doing marketing work, and you should do as much work as possible to earn as much money as possible within a limited time.