INForum.in - Home of the Indian Domain Industry
Results 1 to 9 of 9
  1. #1
    Ross is offline Junior Member
    Join Date
    Feb 2009
    Posts
    13
    Thanks
    0
    Thanked 1 Time in 1 Post

    Post Statistics for .IN zone file

    I am developing algorithm for keywords detection in domain names. I obtained a copy of .IN zone file and applied this algorithm to it.

    Here is some statistics (done only on .IN):
    Total number of domains: 315K
    I have splitted 175K domains. Domains that are only numeric are not included.
    Dictionary used is 100K English only words.

    Total number of keywords detected: 366K or 29K unique.
    Some domain like "name-keyword.in", where name is Indian proper name, for example, are splitted and only keyword included in final statistics.

    Following is a list of keyword frequencies for first 300:


    Code:
    i       7200    em      404     up      259
    india   6495    test    402     ads     258
    on      3008    star    401     play    254
    line    2504    or      401     micro   253
    the     2447    your    401     finance 252
    my      2348    mail    400     film    251
    in      2149    love    399     wedding 251
    tech    1786    digital 395     college 248
    web     1495    ate     393     center  246
    indian  1470    education393    site    246
    group   1447    us      391     way     246
    world   1379    market  390     creative246
    net     1264    guide   386     people  244
    it      1163    shop    382     inc     243
    an      1102    technologies382 security242
    travel  1077    loan    380     tour    241
    info    1056    service 377     click   239
    and     1038    blue    373     pay     239
    free    1015    times   366     team    238
    go      969     card    365     casino  237
    jobs    958     chennai 361     today   236
    solutions956    network 358     lab     236
    of      944     poker   357     good    236
    en      925     hosting 353     directory235
    home    906     sun     352     dream   234
    media   903     game    350     vision  233
    health  892     food    345     future  233
    global  862     ur      344     company 233
    to      858     first   344     sky     233
    city    853     biz     343     reliance 232
    am      851     plus    339     royal   232
    me      823     phone   337     san     231
    tv      798     cheap   336     call    230
    life    792     bio     335     baby    229
    sex     774     data    331     products228
    ad      758     books   330     photo   227
    design  755     red     330     planet  227
    business734     get     328     cars    227
    news    730     os      328     simply  226
    hotel   720     realest 327     movies  225
    hotels  711     pc      325     corp    225
    care    693     zone    323     cash    223
    car     692     holidays320     print   222
    at      687     win     320     mall    222
    mobile  685     tar     319     deals   220
    art     670     travels 319     law     220
    as      663     tours   318     mind    219
    club    656     ind     317     girls   216
    hop     641     max     316     tourism 215
    services639     med     316     video   215
    hi      632     eco     314     corporate215
    pro     630     gold    312     academy 213
    one     627     soft    311     consultants213
    air     608     capital 310     foundation213
    green   604     sms     309     solar   211
    new     602     consulting305   source  210
    best    595     just    304     bazaar  209
    no      589     ms      303     fun     209
    all     588     marketing302    tex     209
    insurance577    cricket 302     fly     208
    is      576     internet301     pages   208
    for     565     properties301   now     207
    property563     sports  301     centre  207
    power   536     point   300     cards   207
    music   536     raj     300     loans   206
    job     532     direct  300     kids    206
    delhi   529     time    300     dr      205
    live    529     porn    300     techno  205
    search  524     asia    296     dating  205
    ala     524     energy  295     talk    204
    do      520     homes   292     he      203
    you     514     im      292     open    203
    school  514     we      291     log     203
    bank    509     space   290     work    201
    money   494     box     288     radio   201
    man     493     land    287     euro    201
    smart   486     bangalore287    help    200
    auto    485     career  287     sale    200
    com     475     studio  282     store   198
    international473real    282     pace    196
    systems 473     tel     282     ticket  196
    games   469     management281   by      195
    buy     466     host    277     shopping195
    big     453     forum   275     retail  194
    credit  448     fashion 274     solution192
    de      445     stock   274     technology191
    mart    437     movie   273     golf    190
    book    430     find    273     day     189
    trade   426     park    271     family  189
    pr      426     tore    270     pal     189
    ker     426     tax     269     liberty 189
    oft     425     be      268     holiday 189
    top     424     computer267     mob     189
    domain  421     office  267     plan    188
    hot     418     super   265     yoga    188
    easy    412     medical 261     realty  188
    guru    412     water   260     trip    187
    software411     con     259     eye     187
    house   407     express 259     labs    187
    link    405     goa     259     tickets 186
    At the beginning some words are "skip words" as expected, like "on", "the", "in" etc.

    On my blog some more zone file charts and stats.

  2. #2
    Jeff's Avatar
    Jeff is offline Administrator
    Join Date
    Mar 2008
    Posts
    2,980
    Thanks
    1,766
    Thanked 474 Times in 277 Posts

    Default Re: Statistics for .IN zone file

    This is really interesting stuff - thanks so much for putting it together!

    Interesting that India and Indian are so common when that's already implied by the .in.

    Bangalore does well for a city that's no longer officially called by that name

  3. #3
    Ross is offline Junior Member
    Join Date
    Feb 2009
    Posts
    13
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default Re: Statistics for .IN zone file

    Yes I have the same observations.
    Do anybody have list of Indian proper names and Geo names (in ascii)? I can try to add them to the dictionary. What is the new name of Bangalore - Bengaluru? I will check if it is in the dictionary. Take note that some proper names do not appear at all due to the fact that my dictionary is not specificaly prepared for Indian names, just a standard aspell with some tuning :-D

  4. #4
    Ceres's Avatar
    Ceres is offline Senior Member
    Join Date
    Mar 2008
    Location
    Canada
    Posts
    2,206
    Thanks
    544
    Thanked 575 Times in 346 Posts

    Default Re: Statistics for .IN zone file

    Ross, thanks for sharing the information. I'm surprised quite a few people include the word "company" in their domain name. I wonder if "company" is part of their business name?

  5. #5
    skyshipper's Avatar
    skyshipper is offline Senior Member
    Join Date
    Jan 2009
    Location
    under the sun
    Posts
    434
    Thanks
    93
    Thanked 73 Times in 50 Posts

    Wink Re: Statistics for .IN zone file

    Quote Originally Posted by Ross View Post
    I am developing algorithm for keywords detection in domain names. Thanks to Jeff I obtained a copy of .IN zone file and applied this algorithm to it.

    Here is some statistics (done only on .IN):
    Total number of domains: 315K
    I have splitted 175K domains. Domains that are only numeric are not included.
    Dictionary used is 100K English only words.

    Total number of keywords detected: 366K or 29K unique.
    Some domain like "name-keyword.in", where name is Indian proper name, for example, are splitted and only keyword included in final statistics.

    Following is a list of keyword frequencies for first 300:

    [/code]At the beginning some words are "skip words" as expected, like "on", "the", "in" etc.

    On my blog some more zone file charts and stats.
    great work! also great work for providing the .in zone.congrats to jeff.

    Bangalore does well for a city that's no longer officially called by that name
    it would be interesting if an analysis is carried out from when the name change occured [around nov06].
    if still bangalore scores over bengaluru,it shows the mindset! .ins going to have a hard time overcoming the .co.in headset[which most domainers are trying to overturn]
    I'm surprised quite a few people include the word "company" in their domain name
    actually it is there in some company registration,used as limited,corporation .also limited and corporation is quite popular.oil-oil india limited.lic-life insurance corporation.
    atc-assamteacompany.

  6. #6
    Ceres's Avatar
    Ceres is offline Senior Member
    Join Date
    Mar 2008
    Location
    Canada
    Posts
    2,206
    Thanks
    544
    Thanked 575 Times in 346 Posts

    Default Re: Statistics for .IN zone file

    Quote Originally Posted by kaustavk666 View Post
    actually it is there in some company registration,used as limited,corporation .also limited and corporation is quite popular.oil-oil india limited.lic-life insurance corporation.
    atc-assamteacompany.
    kaustavk666, do you think it's better to include the 'company,' 'corporation' and 'limited' in the domain name when it's part of the business name? Personally, I'd prefer ATC.in, AssamTea.in or AssamTea.co.in rather than AssamTeaCompany.in as they are shorter and easier to remember. Which one do you think works best for Assam Tea Company?

  7. #7
    skyshipper's Avatar
    skyshipper is offline Senior Member
    Join Date
    Jan 2009
    Location
    under the sun
    Posts
    434
    Thanks
    93
    Thanked 73 Times in 50 Posts

    Wink Re: Statistics for .IN zone file

    atc.in,atc.co.in would seem repeatative. assamtea.in -too generic ,since there are various tea companies in assam.
    assamteacompany.in is ok, but then the .com will be available for such a long name.

    so.
    [btw.atc.in redirects to atc.co.in! another one for .co.in]

  8. #8
    Ceres's Avatar
    Ceres is offline Senior Member
    Join Date
    Mar 2008
    Location
    Canada
    Posts
    2,206
    Thanks
    544
    Thanked 575 Times in 346 Posts

    Default Re: Statistics for .IN zone file

    Yes, I also prefer ATC.in of them all. Too bad it's not owned by Assam Tea Company - they opted for AssamTeaCompany.com instead.

  9. #9
    RaghavK is offline Senior Member
    Join Date
    Sep 2008
    Posts
    413
    Thanks
    0
    Thanked 3 Times in 3 Posts

    Default Re: Statistics for .IN zone file

    yes..ATC was so much better..I think no one wants to spend a fortune on acquiring the right name..

 

 

Similar Threads

  1. Domain Name Registration Statistics 2008
    By Ceres in forum Non-Indian Domains
    Replies: 2
    Last Post: 10-14-2008, 06:58 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •