February 21, 2008

File this under things that obviously should work but I’m still surprised that they do. If you want to restrict your Google results to a single top level domain, you can just use a wildcard with the “site:” modifier. The most relevant use (for us in the US) is with the .gov TLD. Immigration and visa information, for example, is one of those low signal to noise regions of the internet. For any given topic there are a few thousand (by all appearances fraudulent or at least worthless) commercial offerings you have to plow through to get to the real deal. Usually I would use “” but there are so many queries where you might want information from CIS and to see what other departments are using our tax dollars to feed to the Googlebot. On a whim I tried:

site:*.gov citizenship test

There are lots of results from CIS, but also ones from the Library of Congress and the State of Idaho. (The “most US citizens couldn’t pass the citizenship test” trope seems to be almost as popular on .gov as it is in the public sector.)
Speaking of the states, Pennsylvania insists on using * for nearly everything – although they do tend to mix domains all over the place, making NoScript a pain, and – more significantly – screwing up the administration of their SSL certificates.
I can see other uses for this with *.mil and *.edu and whatever else, for filtering out all the nonsense and getting to what we’ve all already paid for.


