• How to reduce crawling and indexing of invalid URLs

    17526854798224294200

    seo

    One is to keep URLs you don't want included as dynamic as possible, or even intentionally as dynamic as possible to prevent crawling and inclusion. However, search engines are now able to crawl and include dynamic URLs,google seo course singapore and this is becoming less of a technical issue. Although many parameters are indeed not conducive to inclusion to a certain extent, 4 or 5 parameters can usually be included. We're not sure how many arguments are needed to block the include, so this can't be considered a reliable approach. The chains received by these URLs have no ranking ability, otherwise a certain amount of weight will be wasted.

    The second method prohibits the inclusion of robots. Likewise, when URLs receive internal links,google seo optimization guide they also receive weight. The robots file prohibits crawling of these URLs, so the received weight cannot be passed out (the search engine does not know what the exported link is without crawling), and the page becomes a black hole that the weight can only enter.

    Even the links to these URLs with nofollow are not perfect. Similar to the prohibition of robots, the effect of nofollow on Google is that none of these URLs receive data weight, but the weight is not assigned to other links in the enterprise, so as a weight we Also wasted. Baidu is said to support nofollow,seo agency singapore but the weight and how it can handle information are unknown.

    Putting these URL links in Flash and JS is useless. Search engines can already crawl links in Flash and JS, and they may be better at it. One thing that many SEOs overlook is that JS links can not only be crawled, but can also pass weight and normal connections.

    You can also set the filter condition link to AJAX form so the user doesn't visit a new URL after clicking it, or add a # sign to the original URL so it's not treated as a different URL. Like the JS issue, search engines are actively trying to crawl and scrape the content in AJAX, and this approach is not safe.

    Another approach is to add a NOINDEX + follow tag to the head of the page, which means the page is not indexed, but the links on the page are followed. This solves the problem of copying content, and also solves the problem of weight black holes (weights can be transferred to other pages with export links). What cannot be solved is the waste of crawling spider crawling time. These pages still need to be crawled (as you see) to the NOINDEX + follow tag in HTML), and for some websites, the number of filtered pages is too large, they can be crawled, and the spider does not have enough time to crawl useful pages.

    Another approach to consider is stealth, which is using programs to detect visitors. If they are a search engine spider, the returned page will be deleted, if they are a user, they will be returned to a normal page with filter conditions. This is an ideal solution. The only problem is, this might be considered cheating. Search engines often tell SEOs that the highest principle to determine whether to cheat is: If there were no search engines, would you do it? Or is one method just for search engines? Obviously, using cloaking to hide URLs you don't want crawled is for search engines, not users. Although the purpose of invisibility in this case is beautiful rather than malicious, there are risks and the bold can be tried.

    Another important method is to use canonical tags. The biggest development question is whether Baidu can support us. It is unknown, and canonical tags are suggestions for search engines, not instructions. This means that search engines may not comply with this social tag, which is equivalent to useless. . In addition, the original intention of the canonical tag is to specify the canonical URL, filtering technical conditions and page design are somewhat questionable. After all, the content on these pages is often different.

    One of the better ways is to disable IFRAME + bots. Placing the filter code in an iframe is equivalent to calling the content of another file. To search engines, this file does not belong to the current page, which means it is hidden. But just because it's not on the current page doesn't mean it doesn't exist. Search engines can find content and links within iframes, or they can crawl these URLs, so adding bots does not allow crawling. There is still some weight reduction in the content of the iframe, but since the links in the iframe are not weighted from the current page, but only from the calling file, the weight reduction is less. Aside from headaches like formatting and cross-browser issues, a potential problem with the iframe approach is the risk of being perceived as cheating. Now search engines generally don't consider iframes to be cheating, and many ads are placed within iframes, but there's a subtle difference between hiding a bunch of links and hiding one ad. Going back to the general principles of search engine cheating, it's hard to argue that this isn't specifically about search engines. Remember Matt Cutts said that Google may change the way it handles iframes in the future, they still want to see everything a regular user sees on the same page.

    In short, I still don’t have a perfect answer to this realistic and serious question. Of course, if you can't solve it perfectly, you can't live. Different websites have different SEO priorities. If the specific problem is analyzed concretely, one or more of the above methods should be able to solve the main problem.

  • Related Posts