Well constant thinking gave me an idea, and somehow I think I can also make one using C#.
Their algorithm probably runs like this (I could be wrong, but this is just an assumption based on what I percieve as possible).
First, this program will check things from 1.0.0.0 until it reach 255.255.255.255.
Then, for every dataset result on each page landing on every tick of the IP, this bot will check the contents, and will try to read all HTML elements, or just the contents. After which he will follow, if those are links to other document, using the root IP. If it detects an error (404 - Not Found), it will try the other part, and also will try to check files on every directory. Then this will save the contents on the DB, and also try to check the links to where it points to do a new scan.
So maybe, sometimes we have to submit our site to these search engine because it could take time for the bots to crawl on every pages. Another thing that these web bots can detect are the "GET" request, which get cached.
So far that is the fundamental, I think. And the other things like meta analysis, checking based on contents and other black SEO practices done based on that Search engine policies.
0 comments:
Post a Comment