May 16

I think Wigix.com could be a real eBay competitor. On Wigix, you build a “portfolio” of items you own. Then they tell you the latest value of each item in your portfolio. So you can add a ton of items you own like a car, computer, cell phone, mp3 player, watch, etc., and then you can check the value of each item. If the price is right, you can list it for sale. What’s great about Wigix is that you can create your portfolio and then come back 2 months later (when you need cash) and see the prices you could get. The fees are much lower than eBay as well.

I wouldn’t be surprised if Wigix takes off or gets acquired within eighteen months.

May 14

Someone asked me today what’s on my blog. I told them when I started it I intended to write quality posts. That didn’t last too long. Now I kind of write short little blurbs on a whim. I don’t generally edit them much. For instance, my first post on how much gas Americans consume in a day actually had a typo in the title(consumer instead of consume). I didn’t notice that until just now. So if you subscribe to this blog, I apologize. My traffic has been growing more than I expected, and if it continues to build I’ll put more thought into what I write. Really the primary reason I’ve even kept this blog going is because I love seeing the type of traffic the search engines bring in.

May 14

The other day I wrote a post on How much Gas Americans use per day. The answer is 400 Million Gallons. A reader wanted to know how much gas the whole world consumes in a day. The answer is about 83 millon bbl’s. One bbl = 42 gallons, so the world consumes about 3.5 billion gallons of gas per day. That means the United States consumes 11% of the total gas consumed per day.

400 million gallons is equal to:

1.5 million cubic meters. 2.56 billion 20oz bottled waters. 400 million gallons of milk.

There are 1.3 quadrillion gallons of water in Lake Michigan. One quadrillion  equals 1,300 trillion. Or 1,300,000,000,000,000. If America had an oil supply of 1.3 quadrillion gallons, and consumed 400 million gallons per day, it would take 8,904 years for us to run out. If the world had an oil supply as big as lake Michigan it would last for 890 years.

Americans use a lot of gas. The world uses a lot of gas. But the world is a big place and there’s a lot to go around.

May 14

Blog spam is out of control. Instead of trying to moderate comments myself or disabling them altogether, I just activated Akismet. I expect that it will solve the problem.

May 10

I wish Gmail made it easier to filter Junkmail.

I would say there are three main types of email:

1. Email. Messages you want to always read: emails from coworkers, family members, your favorite mailing lists, etc.
Gmail does a great job with these. You can easily read your mail, star it, archive it, etc.

2. Spam. Messages you don’t want to ever read, and never, ever solicited: Viagra ads, Nigerian business deals, phishing scams, etc.
Gmail does a great job of filtering spam.  I get between 80-100 spam messages a day. I would say less than 1% make it through to my inbox.

3. Junkmail.  Messages you probably never want to read, but may have solicited months or years ago: updates from eBay, Yelp, [any website that you signed up for at one time or another]…
My inbox is absolutely filled with junkmail. Technically it’s my fault, because I’ve signed up for at least a hundred web 2.0 apps in the past year, but why must it be so hard to filter this? Right now I have 7,641 unread emails in my inbox. I would guess that 80% of those are junk mail. They are from sources like the The Funded, ABC’s The Note, Slashdot, Yelp, Deval Patrick, Hillary Clinton, Barack Obama, Barnes & Noble, Buy.com etc. Sometimes I’ll sign up for some list without even realizing it. Other times I’ll actually want a daily recap of some blog emailed to me….for about a week. The end result is an inbox that is so inundated by unread mail that I pretty much just leave everything in my inbox–and never know how many true unread messages I have. This is frustrating.

Gmail offers a quick fix to filtering an email list like these ones. For instance, I can set all messages from “buy.com_offers@enews.buy.com” to be either deleted or archived automatically. The problem with this approach however, is that I would have to create hundreds of these filters, and keep creating new ones each month. This wastes a lot of time.

Another solution is to mark these messages as Spam. I am wary of doing this. I believe Gmail’s spam filter works via a Bayesian Spam Filter that calculates the probability of a message being Spam by comparing its attributes to previous messages marked as Spam. I think that Gmail aggregates the Spam reporting of its users to filter spam for each individual. If this is the case, than by me marking a message from SocialPicks as Spam, I would increase the likelihood that emails from SocialPicks to other users–some of whom may actually want to read messages from SocialPicks–would also be marked as Spam. I don’t want to do this, because generally I signed up for the email list in the first place and its not SocialPicks’ fault that I actually don’t want to be signed up for it anymore. (If anyone knows the details of how Gmail’s spam filter actually works, and whether or not me marking these types of emails as Spam would actually cause these negative effects, please let me know.)

So if it takes too long to create filters for every junk mail feature, and if I don’t want to mark messages as Spam, what are other possible solutions?

Mark messages as “Junk Mail”. Junkmail is neither email I want to read but also not Spam. Not only could commerical messages from companies be marked as Junkmail, but any type of email list: say a fraternity email list that you don’t care to read after being out of college for a year. It would automatically send these messages to your archive using the same techniques of the spam filter. This feature could also be called “Autofilter”, I don’t really care.

An “Unjoin List” button. Most of these Junk Mail messages have a link to stop receiving these emails. But these links are usually a pain in the a** and require visiting a web page and filling out a form. If I’m not going to take the time to read your email, I certainly don’t want to take the time to visit your website and fill out a form telling you why I don’t want to read your email.

Personally, I would like both of these features to be implemented. Sometimes I want to unjoin a list, sometimes I’d rather not leave a list, but don’t want to create a filter to autoarchive it. I would want the latter in situations where I don’t want to offend the creator of the list or I want to have a copy of the list messages in the unlikely event that I would ever need to reference them.

Filtering Spam has been conquered. Can we conquer Junkmail next?

May 09

By Karma and Date Joined. (HTML, 2.5MB- ha!)

May 08

xirium posted a tarball of all the individual profile pages for HackerNews readers(minus lurkers and those who joined after 05/07/2008). I was curious what insights, if any, could be gleamed from analyzing the data. My findings are below. I could have figured out more interesting things if I also included posts in my data, but I was looking for something simple to work on. BTW, to get the data into a table I wrote a simple python script to parse the html files. The source code is at the bottom. Or you can download the resulting dataset as an excel file.

General Composition of the Dataset

There are 7,159 users in this dataset. The newest user joined 1 day ago and the oldest user(pg) joined 563 days ago(about 19 months ago, appx. 11/2006). It was 4 days before the second user joined. Between days 563 and 441, there were only 34 users. I am guessing this was the beta period. You can easily see the beta period on the right in the picture below.

Karma by Join Date

You can also see some large outliers who have a ton of karma. The largest outliers are:

  1. pg (17544)
  2. nickb (11672)
  3. edw519 (5316)
  4. rms (5017)

Registration Trends

193 days(8 months) is the average elapsed time since joining. The median is 191 days. While signups aren’t particularly skewed to one side, there are definitely periods of higher growth. There are 3 major periods of growth centered around approximately 390 days ago, 220 days ago, and in the past 30 days. The YC deadline for Summer 2007 was 4/2/07 (~390 days ago). The deadline for Winter 2008 was October 11,2007 (~220 days ago). The deadline for Summer 2008 was 4/2/2008(~ 30 days ago). I think it’s fair to conclude that when YC is gearing up to fund a new batch of startups HN signups go up significantly. BTW, the deadline for Winter 2007 was 10/18/2006 (over 500 days ago and thus before HN was public).Hacker News Date Joined Distribution

OOPS! Missing Data

I wanted to see the effects of the first mention of Hacker News in Techcrunch. The results were surprising–I realized this dataset is far from complete. The Techcrunch article was about 60 days ago and you can see a slight jump in registrations below. But only 32 new members from a Techcrunch article? I couldn’t believe it. So I used Search Y Combinator to see if there was any mention of the traffic bump from TC. There was, and pg reported that there were 258 new accounts within 24 hours of the TC article. So this dataset is missing about 88% of new members from that period. So obviously xirium’s dataset is not the complete user list(unless there are thousands of lurkers). He may have mentioned that, I’m not sure.

Busted Data

Karma/Joined Correlation

Their is a slight positive correlation between karma and joined. In other words, the longer you’ve been a member, the higher your karma will be on average. However, a low R2 value indicates that the date joined isn’t a significant predictor of karma. See the image below(extreme outliers removed).

Karma by Joined

Average Karma Per Day

The rookies have a pretty good batting average in terms of average karma per day. But just like in the major leagues, it’s hard to maintain that and there is a negative correlation between membership duration and karma per day average. But even here, length of time isn’t a good predictor of karma per day average.

Average Karma Per Day

About field filled out?

1303 have an about page; 5856 don’t. This works out to 18.2% versus 81.8%.  The longer someone has been a member, the more likely they are to have filled out the about section. Although the correlation is positive, the length of time someone has joined is not a great predictor of whether or not they have an about page.

About Page by Join Date

A better predictor of the about page binary is their karma level. Once again, however, not a great predictor.

Fit about by karma

Conclusions

What did I learn? Honestly I did this mainly to refamiliarize myself with JMP. I didn’t really expect to find a whole lot of interesting things, and found what I expected. Seeing the beta period was cool, as well as the outliers. I also liked discovering that this was an incomplete dataset. The most interesting thing was definitely how memberships shot up during YC app periods.

Python Script to parse  xirium’s original tarball into a CSV file:

import os
files = os.listdir(os.getcwd())
csv = open(”csv2.csv”,’w')
sep = “;”
term = “\n”

for i in files :
if i[-4:] != ‘html’ :
continue
f=open(i, ‘r’)
contents = f.read()
start = contents.find(”user:”)+14
if start < 100 :  ## a couple files are messed up
print i
print start
continue
end = contents.find(”</td>”,start)
user = contents[start:end]
start = contents.find(”created:”)+17
end = contents.find(”</td>”,start)
created = contents[start:end]
start = contents.find(”karma:”)+15
end = contents.find(”</td>”,start)
karma = contents[start:end]
start = contents.find(”about:”)+15
end = contents.find(”</td>”,start)
about = contents[start:end]
about = about.replace(”\n”,”<br>”)
about = about.replace(”\r”,”<br>”)
about = about.replace(”;”,”")
csv.write(user + sep + created + sep + karma + sep + about + term)
f.close()
csv.close()

Updates:

I didn’t realize xirium collected the profile pages on different days. Skews things a bit.

log(karma) by joined

May 08

Americans consume about 400 Million gallons of gas per day. That works out to about 146 Billion gallons of gas per year.

Exxon Mobile’s 2007 net income was $40.6 Billion. Not that this makes any sense, but if you imagine that their entire profit went to lowering gas prices, prices would drop by 27 cents.

May 07

Generally intro flash videos on websites are kind of annoying, but I thought this one was different and kind of entertaining.

May 03

The UpDown is a pretty cool site. If you enjoy gambling in the stock market and haven’t heard of it yet, it’s worth checking out. I signed up a while ago and pretty much forgot about my account. Then each month I am reminded of the site because they send you an email recapping the performance of your portfolio for the month. I’ve actually made a few bucks on the site because my virtual portfolio has outperformed industry benchmarks. I might start putting my real trades on the site and see how much more I can earn. I talked to the founder/CEO, Michael Reich, a few months back and he’s a really bright guy. Compete.com shows very positive growth and I think the UpDown is heading up.