Statistics for .IN zone file

IT.com

Ross

New Member
I am developing algorithm for keywords detection in domain names. I obtained a copy of .IN zone file and applied this algorithm to it.

Here is some statistics (done only on .IN):
Total number of domains: 315K
I have splitted 175K domains. Domains that are only numeric are not included.
Dictionary used is 100K English only words.

Total number of keywords detected: 366K or 29K unique.
Some domain like "name-keyword.in", where name is Indian proper name, for example, are splitted and only keyword included in final statistics.

Following is a list of keyword frequencies for first 300:


Code:
i       7200    em      404     up      259
india   6495    test    402     ads     258
on      3008    star    401     play    254
line    2504    or      401     micro   253
the     2447    your    401     finance 252
my      2348    mail    400     film    251
in      2149    love    399     wedding 251
tech    1786    digital 395     college 248
web     1495    ate     393     center  246
indian  1470    education393    site    246
group   1447    us      391     way     246
world   1379    market  390     creative246
net     1264    guide   386     people  244
it      1163    shop    382     inc     243
an      1102    technologies382 security242
travel  1077    loan    380     tour    241
info    1056    service 377     click   239
and     1038    blue    373     pay     239
free    1015    times   366     team    238
go      969     card    365     casino  237
jobs    958     chennai 361     today   236
solutions956    network 358     lab     236
of      944     poker   357     good    236
en      925     hosting 353     directory235
home    906     sun     352     dream   234
media   903     game    350     vision  233
health  892     food    345     future  233
global  862     ur      344     company 233
to      858     first   344     sky     233
city    853     biz     343     reliance 232
am      851     plus    339     royal   232
me      823     phone   337     san     231
tv      798     cheap   336     call    230
life    792     bio     335     baby    229
sex     774     data    331     products228
ad      758     books   330     photo   227
design  755     red     330     planet  227
business734     get     328     cars    227
news    730     os      328     simply  226
hotel   720     realest 327     movies  225
hotels  711     pc      325     corp    225
care    693     zone    323     cash    223
car     692     holidays320     print   222
at      687     win     320     mall    222
mobile  685     tar     319     deals   220
art     670     travels 319     law     220
as      663     tours   318     mind    219
club    656     ind     317     girls   216
hop     641     max     316     tourism 215
services639     med     316     video   215
hi      632     eco     314     corporate215
pro     630     gold    312     academy 213
one     627     soft    311     consultants213
air     608     capital 310     foundation213
green   604     sms     309     solar   211
new     602     consulting305   source  210
best    595     just    304     bazaar  209
no      589     ms      303     fun     209
all     588     marketing302    tex     209
insurance577    cricket 302     fly     208
is      576     internet301     pages   208
for     565     properties301   now     207
property563     sports  301     centre  207
power   536     point   300     cards   207
music   536     raj     300     loans   206
job     532     direct  300     kids    206
delhi   529     time    300     dr      205
live    529     porn    300     techno  205
search  524     asia    296     dating  205
ala     524     energy  295     talk    204
do      520     homes   292     he      203
you     514     im      292     open    203
school  514     we      291     log     203
bank    509     space   290     work    201
money   494     box     288     radio   201
man     493     land    287     euro    201
smart   486     bangalore287    help    200
auto    485     career  287     sale    200
com     475     studio  282     store   198
international473real    282     pace    196
systems 473     tel     282     ticket  196
games   469     management281   by      195
buy     466     host    277     shopping195
big     453     forum   275     retail  194
credit  448     fashion 274     solution192
de      445     stock   274     technology191
mart    437     movie   273     golf    190
book    430     find    273     day     189
trade   426     park    271     family  189
pr      426     tore    270     pal     189
ker     426     tax     269     liberty 189
oft     425     be      268     holiday 189
top     424     computer267     mob     189
domain  421     office  267     plan    188
hot     418     super   265     yoga    188
easy    412     medical 261     realty  188
guru    412     water   260     trip    187
software411     con     259     eye     187
house   407     express 259     labs    187
link    405     goa     259     tickets 186
At the beginning some words are "skip words" as expected, like "on", "the", "in" etc.

On my blog some more zone file charts and stats.
 
This is really interesting stuff - thanks so much for putting it together!

Interesting that India and Indian are so common when that's already implied by the .in.

Bangalore does well for a city that's no longer officially called by that name :D
 
Yes I have the same observations.
Do anybody have list of Indian proper names and Geo names (in ascii)? I can try to add them to the dictionary. What is the new name of Bangalore - Bengaluru? I will check if it is in the dictionary. Take note that some proper names do not appear at all due to the fact that my dictionary is not specificaly prepared for Indian names, just a standard aspell with some tuning :-D
 
Ross, thanks for sharing the information. I'm surprised quite a few people include the word "company" in their domain name. I wonder if "company" is part of their business name?
 
I am developing algorithm for keywords detection in domain names. Thanks to Jeff I obtained a copy of .IN zone file and applied this algorithm to it.

Here is some statistics (done only on .IN):
Total number of domains: 315K
I have splitted 175K domains. Domains that are only numeric are not included.
Dictionary used is 100K English only words.

Total number of keywords detected: 366K or 29K unique.
Some domain like "name-keyword.in", where name is Indian proper name, for example, are splitted and only keyword included in final statistics.

Following is a list of keyword frequencies for first 300:

[/code]At the beginning some words are "skip words" as expected, like "on", "the", "in" etc.

On my blog some more zone file charts and stats.
great work! also great work for providing the .in zone.congrats to jeff.

Bangalore does well for a city that's no longer officially called by that name
it would be interesting if an analysis is carried out from when the name change occured [around nov06].
if still bangalore scores over bengaluru,it shows the mindset! .ins going to have a hard time overcoming the .co.in headset[which most domainers are trying to overturn:)]
I'm surprised quite a few people include the word "company" in their domain name
actually it is there in some company registration,used as limited,corporation .also limited and corporation is quite popular.oil-oil india limited.lic-life insurance corporation.
atc-assamteacompany.
 
actually it is there in some company registration,used as limited,corporation .also limited and corporation is quite popular.oil-oil india limited.lic-life insurance corporation.
atc-assamteacompany.

kaustavk666, do you think it's better to include the 'company,' 'corporation' and 'limited' in the domain name when it's part of the business name? Personally, I'd prefer ATC.in, AssamTea.in or AssamTea.co.in rather than AssamTeaCompany.in as they are shorter and easier to remember. Which one do you think works best for Assam Tea Company?
 
atc.in:),atc.co.in would seem repeatative. assamtea.in -too generic ,since there are various tea companies in assam.
assamteacompany.in is ok, but then the .com will be available for such a long name.

so.
[btw.atc.in redirects to atc.co.in! another one for .co.in]
 
:) Yes, I also prefer ATC.in of them all. Too bad it's not owned by Assam Tea Company - they opted for AssamTeaCompany.com instead.
 

whois



Forums dedicated to Indian domain names, including buying, selling, appraising, developing, and monetizing.

About Us

Threads
29,389
Messages
76,794
Members
7,949
Latest member
Yuvandomain
Top Bottom