Archive for the ‘Apache’ Category

Wayback machine, privacy and old plip.com

Tuesday, October 5th, 2010

This post is a short parable told in three lessons:

Lesson 1: The web is not as temporal as you might think!

Recently a co-worker was travelling and was unable to access her work based email. Instead, she directed folks to email her at her personal email. Being a curious fellow, I clicked over to her personal site to see what she had to say. All I found was "Site in progress, check back later" and link to a very outdated resume. Well, that's just no fun! Enter the wayback machine! Using this fine site, I was able to see all the text, photos and links she had long since redacted. The wayback machine never forgets, so don't you forget that.

Lesson 2: Robots.txt can pull Jedi mind tricks.

A natural response to seeing the archive of other sites, is to see what dirt folks might find out about me via the same method. Sure enough, there's some good stuff! However, the more interesting fact I learned is that my robots.txt of today redacted the archive.org copy of yesterday! This is cool! A while ago I took down my resume and some older, more personal content and as well took a sec to make some broad strokes of search engines shouldn't index. It was these actions that archive.org took note of. With a wave of my robots.txt hand, indeed these are not the pages you're looking for.

Lesson 3: The wayback machine is way cool.

Ok, this parable kinda peters out right about here, but still, the wayback machine is way cool. Check out the rad looks plip.com has had over the years! Hrm, maybe that should be "rad". You decide.

The very, very poor man's Google Analytics: tail, cut, sort, uniq & wc

Thursday, July 30th, 2009

I still have what most would call an unfounded fear of privacy when it comes to Google. They may receive a copy of every email I send to my friends who use gmail, they may place every call to me via Google Voice, they may server every ad from Double Click (which I then block) and I sure as heck never stray from their bad-ass search on google.com, but I don't host anything with them directly.

I've run my share of web analizer tools, but some times I wanna know, right now, "how many people subscribe to my blog feed?". Now, I probably should be using FeedBurner (No shit – I did not know, 'til just this second, that they too are now owned by Google. Oh, the irony!), but my site, despite its claims, is still a bit of the cobbler's child when it comes to analytics. Heck, I still don't have mod_usertrack on!

Enter tail, cut, sort, uniq and wc!

tail -10000 access_log|grep /blog|cut -d" " -f 1|sort|uniq|wc

In layman's term's that's "get the last 10000 lines of my access log, cut each line into fields separated by the space character, grab the first field (the IP address in this case), sort the resulting lines of now just an IP address per line, remove the duplicates and count the number resuling lines (or IP addresses)". Presto! 388 of you out there, including all the bots, spiders, crawlers, trolls and goblins. Thanks for the interest!