September 30, 2008

Python - StackOverflow Fight

I've been having fun using the new StackOverflow site for answering technical questions.

Here is a Python script I call 'StackOverlow Fight' (like Google Fight). It takes a list of tags and gets the count of questions that are tagged with each one.

As an example, here is StackOverflow fight between some popular programming languages:

#!/usr/bin/env python

import urllib
import re

langs = ('python', 'ruby', 'perl', 'java')

url_stem = 'http://stackoverflow.com/questions/tagged/'

counts = {}
for lang in langs:
    resp = urllib.urlopen(url_stem + lang).read()
    m = re.search('summarycount.*>(.*)<', resp)
    count = int(m.group(1).replace(',', ''))
    counts[lang] = count
    print lang, ':', count
sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
sorted_counts.reverse()
print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

Output:

python : 733
ruby : 391
perl : 167
java : 1440
java wins with 1440

8 comments:

Unknown said...

Add C# to the mix. Guaranteed it crushes Java.

Unknown said...

I was correct -- C# hauls in 2209 questions.

Steve said...

Does this officially mean that C# is the most difficult language on StackOverflow? ;-)

Anonymous said...

Yes it is absolut true. C# wins. Change the 4. line to

langs = ('python', 'ruby', 'perl', 'java','c%23')

S H O N said...

And PHP ;) It's close to Java.

shon@ubuntu:~/python$ date
Tue Dec 8 18:28:56 IST 2009

shon@ubuntu:~/python$ python stack_fight.py
python : 14226
ruby : 5932
perl : 3147
java : 27568
c%23 : 51962
php : 21305
c%23 wins with 51962

Unknown said...

Cool post. I thought it would be fun to see how things have changed over the last 3 1/2 years since the last post on this. I also added other languages for fun:

Java has 255997 hits
PHP has 235259 hits
Javascript has 220234 hit
C# has 220234 hits
C++ has 130371 hits
.NET has 119156 hits
Python has 112801 hits
Objective-C has 87613 hit
SQL has 83174 hits
C has 61612 hits
Ruby has 47578 hits
Perl has 18683 hits
delphi has 15376 hits
Groovy has 4237 hits
Lisp has 1937 hits
Go has 891 hits
Pascal has 480 hits
Ada has 324 hits
Basic has 73 hits
Logo has 23 hits
NXT-G has 1 hits

Java wins with 255997

So Java is still on top, and StackOverflow is Overflowing with questions and answers!

I also modified your code a bit to a) printed the languages after you sorted the list
b) put a check in for a None exception I got a couple times when playing around with your fun script.

Here is the updated code:

import urllib
import re

langs = ('Python', 'Ruby', 'Perl', 'Java', 'C++', 'PHP', 'C', 'Go',
'Javascript', 'C#', 'Groovy', 'Objective-C', 'Basic', 'SQL',
'delphi', '.NET', 'Lisp', 'Pascal', 'Ada',
'Logo', 'NXT-G', 'Visual Basic')

url_stem = 'http://stackoverflow.com/questions/tagged/'

counts = {}
for lang in langs:
resp = urllib.urlopen(url_stem + lang).read()
m = re.search('summarycount.*>(.*)<', resp)
if m is None:
counts[lang] = count
else:
count = int(m.group(1).replace(',', ''))
counts[lang] = count
#print lang, ':', count
sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
sorted_counts.reverse()

for name,hcount in sorted_counts:
print name ,"has",hcount,"hits"
print ''
print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

Unknown said...

Great idea Corey, lets see where things stand June 2012:

Java has 255997 hits
PHP has 235259 hits
Javascript has 220234 hits
C# has 220234 hits
C++ has 130371 hits
.NET has 119156 hits
Python has 112801 hits
Objective-C has 87613 hits
SQL has 83174 hits
C has 61612 hits
Ruby has 47578 hits
Perl has 18683 hits
delphi has 15376 hits
Groovy has 4237 hits
Lisp has 1937 hits
Go has 891 hits
Pascal has 480 hits
Ada has 324 hits
Basic has 73 hits
Logo has 23 hits
NXT-G has 1 hits

Java wins with 255997

So Java is still 'on top', and wow, what an lot of questions and answers in 2 1/2 years! StackOverflow is overflowing for sure. ;)

I also made a few small changes to your script: a) added more languages, languages are printed in order of hit count, added a check for a None I got twice when playing around with the language list; not sure why though. here is the update, hopefully it displays OK:

import urllib
import re

langs = ('Python', 'Ruby', 'Perl', 'Java', 'C++', 'PHP', 'C', 'Go',
'Javascript', 'C#', 'Groovy', 'Objective-C', 'Basic', 'SQL',
'delphi', '.NET', 'Lisp', 'Pascal', 'Ada',
'Logo', 'NXT-G', 'Visual Basic')

url_stem = 'http://stackoverflow.com/questions/tagged/'

counts = {}
for lang in langs:
resp = urllib.urlopen(url_stem + lang).read()
m = re.search('summarycount.*>(.*)<', resp)
if m is None:
counts[lang] = count
else:
count = int(m.group(1).replace(',', ''))
counts[lang] = count

sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
sorted_counts.reverse()

for name,hcount in sorted_counts:
print name ,"has",hcount,"hits"
print ''
print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

Corey Goldberg said...

@Sol. cool!