← Home

The Search Suggestion Popularity Index

This is an idea I've had in my head for ages now. Months, maybe years - I can't remember when I first thought of it, but I know it was last year or earlier. I don't know the date because I've never written it down until now.

Ever used a search engine? Notice how, as you type, you get a drop-down with a few suggestions for what you might be searching for? Notice how popular things get suggested with very few characters, and less popular things require more characters to be typed?

Well, that's the basis of the idea. Rank the popularity of some thing by how many characters need to be typed before it shows up in suggestions.

The lowest possible number is zero, which is when you type nothing. Not all search engines have zero-character suggestions, but it's relevant for the ones that do. The highest possible meaningful number is one less than the number of characters in the thing's name: if you have to type the entire name for the search engine to suggest it, then there's no guarantee the thing even exists in its index, because search engines will usually suggest the term you've typed verbatim if they can't come up with anything longer.

Alongside how many characters must be typed, there's another aspect to consider: how high in the suggestions list is it? Is it 1st, 2nd, 5th, 7th? I suggest incorporating this into our popularity value as a fraction, where the numerator is the position in the list, counting from zero, and the denominator is the (maximum) length of the list. For example, the "berkeley software distribution" would get a score of 12 + 1/4, or 12.25, because I have to type "berkeley sof" (12 characters) to see it suggested on Google, and it has second place in the suggestions list, which is 4 items long.

Now we have a way to rank things, yay! But it's not very fair, is it? Things with very long names will usually need more characters to be typed to be suggested. Things with short names won't... on the other hand, things with short names will need more of their name typed, given how many characters in their name are likely to be shared. Things starting with common prefixes will be unfairly disadvantaged. There's also another issue: not everything has a single name, or a single form of their name anyway. Depending on which form is used, a thing will get different popularity values.

I'm not sure how to correct for these two issues. You could divide the popularity value by the length of the name, but I'm not sure how fair that is. For the naming issue, you could average the scores of various names, or decide upon an authority which states what the "official" form should be. I don't know which is better. There is one case that can be easily solved: for names where there's just a long form and short form, and the long form begins with the short form (e.g. "Apple" and "Apple, Inc.") you can just use the shortest value you can find.

I might do more thinking on this one day, who knows. This is just a short musing about a thought I had. I hope you enjoyed it.