Results 1 to 9 of 9
Thread: Statistics for .IN zone file
-
04-01-2009, 08:02 AM #1
Junior Member
- Join Date
- Feb 2009
- Posts
- 13
- Thanks
- 0
- Thanked 1 Time in 1 Post
Statistics for .IN zone file
I am developing algorithm for keywords detection in domain names. I obtained a copy of .IN zone file and applied this algorithm to it.
Here is some statistics (done only on .IN):
Total number of domains: 315K
I have splitted 175K domains. Domains that are only numeric are not included.
Dictionary used is 100K English only words.
Total number of keywords detected: 366K or 29K unique.
Some domain like "name-keyword.in", where name is Indian proper name, for example, are splitted and only keyword included in final statistics.
Following is a list of keyword frequencies for first 300:
At the beginning some words are "skip words" as expected, like "on", "the", "in" etc.Code:i 7200 em 404 up 259 india 6495 test 402 ads 258 on 3008 star 401 play 254 line 2504 or 401 micro 253 the 2447 your 401 finance 252 my 2348 mail 400 film 251 in 2149 love 399 wedding 251 tech 1786 digital 395 college 248 web 1495 ate 393 center 246 indian 1470 education393 site 246 group 1447 us 391 way 246 world 1379 market 390 creative246 net 1264 guide 386 people 244 it 1163 shop 382 inc 243 an 1102 technologies382 security242 travel 1077 loan 380 tour 241 info 1056 service 377 click 239 and 1038 blue 373 pay 239 free 1015 times 366 team 238 go 969 card 365 casino 237 jobs 958 chennai 361 today 236 solutions956 network 358 lab 236 of 944 poker 357 good 236 en 925 hosting 353 directory235 home 906 sun 352 dream 234 media 903 game 350 vision 233 health 892 food 345 future 233 global 862 ur 344 company 233 to 858 first 344 sky 233 city 853 biz 343 reliance 232 am 851 plus 339 royal 232 me 823 phone 337 san 231 tv 798 cheap 336 call 230 life 792 bio 335 baby 229 sex 774 data 331 products228 ad 758 books 330 photo 227 design 755 red 330 planet 227 business734 get 328 cars 227 news 730 os 328 simply 226 hotel 720 realest 327 movies 225 hotels 711 pc 325 corp 225 care 693 zone 323 cash 223 car 692 holidays320 print 222 at 687 win 320 mall 222 mobile 685 tar 319 deals 220 art 670 travels 319 law 220 as 663 tours 318 mind 219 club 656 ind 317 girls 216 hop 641 max 316 tourism 215 services639 med 316 video 215 hi 632 eco 314 corporate215 pro 630 gold 312 academy 213 one 627 soft 311 consultants213 air 608 capital 310 foundation213 green 604 sms 309 solar 211 new 602 consulting305 source 210 best 595 just 304 bazaar 209 no 589 ms 303 fun 209 all 588 marketing302 tex 209 insurance577 cricket 302 fly 208 is 576 internet301 pages 208 for 565 properties301 now 207 property563 sports 301 centre 207 power 536 point 300 cards 207 music 536 raj 300 loans 206 job 532 direct 300 kids 206 delhi 529 time 300 dr 205 live 529 porn 300 techno 205 search 524 asia 296 dating 205 ala 524 energy 295 talk 204 do 520 homes 292 he 203 you 514 im 292 open 203 school 514 we 291 log 203 bank 509 space 290 work 201 money 494 box 288 radio 201 man 493 land 287 euro 201 smart 486 bangalore287 help 200 auto 485 career 287 sale 200 com 475 studio 282 store 198 international473real 282 pace 196 systems 473 tel 282 ticket 196 games 469 management281 by 195 buy 466 host 277 shopping195 big 453 forum 275 retail 194 credit 448 fashion 274 solution192 de 445 stock 274 technology191 mart 437 movie 273 golf 190 book 430 find 273 day 189 trade 426 park 271 family 189 pr 426 tore 270 pal 189 ker 426 tax 269 liberty 189 oft 425 be 268 holiday 189 top 424 computer267 mob 189 domain 421 office 267 plan 188 hot 418 super 265 yoga 188 easy 412 medical 261 realty 188 guru 412 water 260 trip 187 software411 con 259 eye 187 house 407 express 259 labs 187 link 405 goa 259 tickets 186
On my blog some more zone file charts and stats.
-
04-01-2009, 01:35 PM #2
Re: Statistics for .IN zone file
This is really interesting stuff - thanks so much for putting it together!
Interesting that India and Indian are so common when that's already implied by the .in.
Bangalore does well for a city that's no longer officially called by that name
-
04-01-2009, 01:46 PM #3
Junior Member
- Join Date
- Feb 2009
- Posts
- 13
- Thanks
- 0
- Thanked 1 Time in 1 Post
Re: Statistics for .IN zone file
Yes I have the same observations.
Do anybody have list of Indian proper names and Geo names (in ascii)? I can try to add them to the dictionary. What is the new name of Bangalore - Bengaluru? I will check if it is in the dictionary. Take note that some proper names do not appear at all due to the fact that my dictionary is not specificaly prepared for Indian names, just a standard aspell with some tuning :-D
-
04-01-2009, 02:57 PM #4
Re: Statistics for .IN zone file
Ross, thanks for sharing the information. I'm surprised quite a few people include the word "company" in their domain name. I wonder if "company" is part of their business name?
-
04-01-2009, 06:02 PM #5
Re: Statistics for .IN zone file
great work! also great work for providing the .in zone.congrats to jeff.
it would be interesting if an analysis is carried out from when the name change occured [around nov06].Bangalore does well for a city that's no longer officially called by that name
if still bangalore scores over bengaluru,it shows the mindset! .ins going to have a hard time overcoming the .co.in headset[which most domainers are trying to overturn
]
actually it is there in some company registration,used as limited,corporation .also limited and corporation is quite popular.oil-oil india limited.lic-life insurance corporation.I'm surprised quite a few people include the word "company" in their domain name
atc-assamteacompany.
-
04-01-2009, 06:30 PM #6
Re: Statistics for .IN zone file
kaustavk666, do you think it's better to include the 'company,' 'corporation' and 'limited' in the domain name when it's part of the business name? Personally, I'd prefer ATC.in, AssamTea.in or AssamTea.co.in rather than AssamTeaCompany.in as they are shorter and easier to remember. Which one do you think works best for Assam Tea Company?
-
04-01-2009, 06:39 PM #7
Re: Statistics for .IN zone file
atc.in
,atc.co.in would seem repeatative. assamtea.in -too generic ,since there are various tea companies in assam.
assamteacompany.in is ok, but then the .com will be available for such a long name.
so.
[btw.atc.in redirects to atc.co.in! another one for .co.in]
-
04-01-2009, 06:57 PM #8
Re: Statistics for .IN zone file
Yes, I also prefer ATC.in of them all. Too bad it's not owned by Assam Tea Company - they opted for AssamTeaCompany.com instead.
-
04-06-2009, 02:13 PM #9
Senior Member
- Join Date
- Sep 2008
- Posts
- 413
- Thanks
- 0
- Thanked 3 Times in 3 Posts
Re: Statistics for .IN zone file
yes..ATC was so much better..I think no one wants to spend a fortune on acquiring the right name..
Similar Threads
-
Domain Name Registration Statistics 2008
By Ceres in forum Non-Indian DomainsReplies: 2Last Post: 10-14-2008, 06:58 PM


LinkBack URL
About LinkBacks
Reply With Quote



