As A reader of my blog, you are able to save 15% on the retail price of any GSA tool! To get this discount use any of the links below or in my posts then use the coupon code “shaunmarrs” on the checkout page.
Welcome to my tutorial on how to build your own auto accept list using GSA Search Engine Ranker! Although in theory this whole process can be completed using only GSA Search Engine Ranker, your time and a desktop or laptop computer, the process can be made much easier by supplementing your tool kit with GSA Captcha Breaker, catch-all emails, and a VPS or server.
This post is part of a series and should be read in the following order.
In my opinion, GSA Search Engine Ranker is one of, if not the best automated one-time fee tool on the market. It has the ability to complete so many different tasks and functions it has spawned dedicated spin-off tools such as GSA Platform Identifier and GSA Proxy Scraper. Also, it wouldn’t surprise me if there were plans for a dedicated GSA Scraping tool on the horizon.
As touched on in my introduction post to creating your own auto accept list, this process can be broken down into three main parts. Link acquisition, link identification, and link verification. SER is able to complete all of these processes with no external support but using SER in this way slows down it’s overall performance.
Preparing To Scrape
As with most things, a little preparation can go a long way. Ideally, you will want a clean list of footprints, I explain how to do this in my post here. Essentially, you go over all of the default footprints and remove any that have a low potential link yield or are too ambiguous to be worth scraping.
Once that has been complete we need to prep GSA Search Engine Rankers proxy settings to maximize your scraping time. There are three options you can choose for this next section. You can choose to scrape with private proxies, public proxies or both. Personally, I prefer to scrape with either private proxies or public proxies and never mix them.
Scraping With Private Proxies
When scraping with private proxies we set SERs proxies settings up a little differently than we would when using public proxies. Ideally, your settings will look a little something like the screenshot below.
- Enable the use of proxies in the tool.
- Enable the use of proxies for search engine usage.
- Enable private proxies for use with search engines.
- Enable a custom time to wait between search queries to reduce the risk of the private proxies being soft banned from the search engines.
- At the time of writing, my current time out for Google queries per proxies is 70 seconds. The screenshot above shows a 70-second timeout that is ideal if using a single proxy. If you are using 10 proxies you may want to use a 7-second timeout to limit each proxy from querying the search engines to once every 70 seconds. This should reduce the risk of your private proxies being soft banned but be aware that if you use footprints with advanced search modifiers such as intitle: or inanchor: then you run the risk of requiring a higher timeout. If you are wanting to scrape any other search engine your timeout can be lower than this as they are not as strict as Google.
- Next, we click the configure button to open the window shown in the screenshot below.
Click the “Add Proxy” button and select either of the options at the bottom to import your private proxies into SER.
Scraping With Public Proxies
When using public proxies you will want your settings to look similar to the screenshot below.
- Enable the use of proxies.
- Enable the use of proxies for search engines.
- Enable the use of public proxies when scraping the search engines.
- Click the configure button.
With this option, we are leveraging the use of freely available public proxies. As they are free there will be a large number of people scraping them for a number of reasons. Due to this, you want to get as much out of them as possible before they are soft banned from the search engines. Due to this, we do not enable any timeouts. These proxies will be soft banned quickly due to other users using them so you want to run as many search queries as possible to get the maximum searches per proxies before they are useless to you.
- Select proxy options rather than the proxy list window.
- Enable the automatic proxy search option.
- Select your public proxy scrape window, as its requirements are low you can go as low as every minute if you wish.
Saving All Of Your Hard Work
One of the main strengths of GSA Search Engine Ranker is its ability to develop massive auto accept lists to use as and when required. I cover the uses of the folders in the screenshot below in my ultimate guide to GSA Search Engine Ranker but the main point to take away is that if you do not tick the verified box then will not save verified domains to your list.
Without ticking it your verified URLs essentially become a one hit wonder for the specific project that managed to verify it. If you tick the box then SER will automatically save the domain to your verified list for future reference allowing you to set up other projects to post to it instantly rather than having to begin the whole process from scratch.
Prepping The SER Project For Scraping
Moving onto the project specific options, at the very top we have the submission settings shown in the screenshot below. You can set this up as you like but the point I want to get across is, if this is your only project and you put a limit on your project SER will get to its limit and stop. In theory, this could leave your toolset sitting doing nothing for hours per day wasting resources, time and money.
A better way to run the system and build your auto accept list quicker would be to run a background project building links for the exact same platforms as your live project to your money site. Target your backup project to a made up URL and let it run 24 hours a day with no limits. This way your tool will be scraping and processing targets none stop and pushing the verifieds into your verified list.
When your live project is released from its pause limits you set it can either pick up the new targets from the verified folder and post to them or scrape on to help build your list.
A little further down we have the “How To Get Target URLs” section shown in the screenshot below.
- Select the search engines you wish to scrape, be aware that to my knowledge all of the big search engines local sites such as google.co.uk will share their soft ban proxy data with all other domains. This means there is no reason to select five versions of Google in the search engine window as it will just get your proxies soft banned five times quicker. Personally, I would just select google.com and bing.com (called MSN in the list). If you wish SER to scrape for targets then simply select search engines here, if you don’t want SER to scrape then select none.
- The analyze and post to competitor backlinks option is basically SERs version of link extraction. Depending on the other things you have running on your VPS it may be wise to leave it unticked as it can become extremely resource heavy.
- Ticking either of the folder boxes will attempt to pull targets from that particular folder as well as scrape and link extract if you enable those options.
Scrolling further down the options tab we come to the “Filter URLs” options as shown in the screenshot below.
- The “Type of Backlinks to create” pane is the main thing I want to touch on here as it can massively affect your link yield. Many of the platforms support multiple backlink types meaning turning some of these off before you understand what they do can reduce your total link yield.
Enabling any of the other filtering options in this section will also massively reduce your link yield but they are much more self-explanatory. When using contextual link types I see no reason to enable a single filter for the project.
That being said, if I was running a project looking for high-quality blogs to comment then I may consider enabling the bad word filter, just be aware that it is a double edged sword as not every blog post with a bad word on it means it is a negative post that may affect your money site.
A Note On Emails
I always recommend catch all emails when using the generic GSA Search Engine Ranker engines as they can take a hammering and just keep on going. If you choose to use traditional emails then your email provider may ban the email account after receiving a large number of confirmation emails within a short time.
Depending on the platform and engine selection and posting settings for your project this can actually be pretty quick. This leads to problems as your traditional email accounts may be banned meaning any verification emails sent from the domains you are trying to process are being sent to email accounts you no longer have access to. As you are unable to verify their emails the links will not go live meaning you lower your link yield and end up wasting your time and resources until you refill the project with live accounts.
To Keyword Or Not To Keyword
The Data tab of the project has the ability for the user to add keywords as shown in the screenshot below. Any keywords you add to this field will be merged with SERs internal footprints in an attempt to find niche related pages to leave a link on.
I personally feel that this setting depends on your goals with the tool. If you are wanting to use GSA Search Engine Ranker to build large amounts of contextual links then I see no reason to use keywords, simply put a space in this field for SER to scrape with the raw footprints as it will provide many more potential targets for the tool to process.
When building contextual pages on domains with the tool you are able to theme the to your niche by using a niche relevant article on your submission. Due to this, my personal thinking is that it is better to have as many contextual domains as possible, especially if you are using them as your tier two or three.
If you are using SER to scrape for blog comments that have high metrics to leave an automated message then it is probably a good idea to load this field with as many niche relevant keywords as possible. This way you have a much higher chance of the tool creating a relevant link. On the flip side, your potential targets are massively reduced so its all about trade offs.
Thankfully, SER has been built as an all in one tool provided you are willing to sacrifice the resources to allow it to focus on the other tasks at hand. This makes it extremely easy for people new to the tool to get started building their list. Now I know there are a bunch of things down to personal choice in this guide so I won’t be able to provide an exact overview of what the tool is going to do when you press start but here are the basic stages.
- SER is going to check your proxy settings and if you have told it to use public proxies it will scrape and check public proxies until it discovers some valid ones to use.
- It will then check the platforms and engines you have enabled for the projects you have set to active. It automatically pulls the footprints for the platforms you have enabled from its footprint factory in an attempt to maximize efficiency.
- It will then check to see if you have added any keywords to the project for it to scrape with, if you have it will merge them with the footprints ready to scrape. The only way I am aware of to get SER to scrape without footprints is to put a space in the footprints field shown above. I’m not sure about the other search engines but I know Google discount the space resulting in a raw footprint scrape.
- It will then check the search engines you have enabled for the project and go off to do its initial scrape of them. Once its initial scrape is complete it will check your search engine timeout setting and repeat its scrapes at that timeout interval.
- When scraping the search engines a massive amount of random useless sites will be returned. SER will harvest all of these pages and check their URLs and on-page for footprints relevant to the platforms and engines you have enabled. This is how SER identifies the targets as potentially viable.
- SER will then check to see if you have enabled the identified folder tick box, if so it will save identified targets to a list in that folder and move onto submission.
- The submission stage drastically changes depending on the engines enabled but essentially, it’s going to attempt to post a link to that specific page or its domain and check any required emails in the process. If you have your submitted folder tickbox enabled the URL will be saved to a list there and SER will add a submission count to the projects window for the project.
- Once submitted SER then attempts to verify that the link has gone live by checking the profile or page it things the link should be live on. If it is presented with an error message or something informing it of rejection or the account being banned it will remove the target from the submitted column. If you have enabled the failed folder tickbox the domain will be added to the list there. If the URL is detected on the page then the target is changed from being a submission to a verified URL on the project pane and provided you have enabled the tick box for your verified folder the URL will be saved.
As I mentioned earlier, there are a lot of moving parts when it comes to SER but that covers the basics of the process with various stages being repeated in the background during other stages of the process.
I hope this post has covered the basics of building your own entry level auto accept list. To my knowledge, outside of using a premium list this is the cheapest method of growing your list.
I always see people on the various forums asking questions along the lines of “I am self-scraping a list and getting xx links per minute, is this good?”. In addition to the choices available to the user in this post, there are other factors such as the quality of the VPS or server being used, the quality of the proxies and a whole host of other things. The best way to look at it is to view that inital links per minute count as a base line and then work to improve on it as you move forward.