Blogs

PHP Performance of removing an element off the beginning of an array

Being the performance nut that I am I decided to do some micro benchmarks on the various options of removing an element off the beginning of an array. There are several ways to do this, here is the code and the results:


function array_shift_test() {

  // Create arrays with 100000 elements
  $array_1 = array();
  for ($i = 0; $i < 100000; $i++) {
    $array_1[] = rand();
  }
  $array_2 = $array_3 = $array_4 = $array_5 = $array_1;

  $output = '';

  // Remove the last 1000 elements using array_pop.
  $start = microtime(true);
  for ($i = 0; $i < 1000; $i++) {
    array_pop($array_1);
  }
  $stop = microtime(true);
  $output .= sprintf("array_pop takes %.5f seconds\n", $stop - $start);

  // Remove the first 1000 elements using array_shift.
  $start = microtime(true);
  for ($i = 0; $i < 1000; $i++) {
    array_shift($array_2);
  }
  $stop = microtime(true);
  $output .= sprintf("array_shift takes %.5f seconds\n", $stop - $start);

  // Remove the first 1000 elements by reversing the array and popping 1000 elements.
  $start = microtime(true);
  $array_rev = array_reverse($array_3);
  for ($i = 0; $i < 1000; $i++) {
    array_pop($array_rev);
  }
  $array_3 = array_reverse($array_rev);
  $stop = microtime(true);
  $output .= sprintf("array_reverse + array_pop takes %.5f seconds\n", $stop - $start);

  // Unset the first 1000 elements.
  $start = microtime(true);
  for ($i = 0; $i < 1000; $i++) {
    unset($array_4[$i]);
  }
  $stop = microtime(true);
  $output .= sprintf("unset takes %.5f seconds\n", $stop - $start);

  // Use each to just iterate.
  $start = microtime(true);
  for ($i = 0; $i < 1000; $i++) {
    list($key, $value) = each($array_5);
  }
  $stop = microtime(true);
  $output .= sprintf("each takes %.5f seconds\n", $stop - $start);

  return $output;
}

Results (PHP 5.2.13):
array_pop takes 0.01588 seconds
array_shift takes 5.15715 seconds
array_reverse + array_pop takes 0.04179 seconds
unset takes 0.01475 seconds
each takes 0.00118 seconds

Results (PHP 5.3.2):
array_pop takes 0.01448 seconds
array_shift takes 5.01599 seconds
array_reverse + array_pop takes 0.03513 seconds
unset takes 0.01494 seconds
each takes 0.00107 seconds

each() is the clear winner here, but it comes with some caveats. each() only advances the pointer, it doesn't actually remove an element. This is fine if you are doing complicated iterations over the array, but not much else. It's also not as clear to the reader as to what is going on in the code. Therefore I'd recommend using unset() unless this is an extremely performance critical path. Stay away from array_shift() at all costs (it needs to re-index the array).

There's also the question of memory usage. unset() and each() don't actually remove the element from memory, each() just advances a pointer and unset() just removes access to the element. So you should keep this in mind if your array is very memory intensive.

Other reading:
http://drupal.org/node/172764
http://kb.ucla.edu/articles/performance-of-array_shift-and-array_pop-in-...

Colliding cookies

cookiesFor the Alberta Greens we're setting up a swath of subdomains to use in organizing voter phonebanking and canvassing. The setup looks like this:
domain.com
sub1.domain.com
sub2.domain.com

However we had some strange problems where after logging in to one site you couldn't log into any of the other sites. A bit of research found that the problem
was that PHP session cookies were colliding. I found a post in the Drupal issue tracker that someone else had already filed for the problem. Different people did research in different area on what was causing the problem and how to fix it.

The problem happens because by default Drupal sets the cookie domains to be:
.domain.com
.sub1. domain.com
.sub2.domain.com
And all session cookies are given the same name. And in theory all should work as expected because PHP should return the most specific cookie for our current session. But it doesn't.

It turns out that web browsers return all the cookies that could possibly apply, not just the most specific one. And PHP gives Drupal the last cookie it gets, which unfortunately is probably not the one we want. So this is a PHP bug. It's also a bug in the original cookie spec as there are conflicting rules about what order cookies should be sent by the browser. And thus different browsers can return the cookies in whatever order they choose. Yikes, this runs deep.

So a bug has been filed with PHP. But meanwhile a patch has been filed to make Drupal work around the issue by giving a unique name (derived from the $base_url) for the session cookie. Hopefully this patch make it into Drupal 5.2.

So this rather complicated (and hard-to follow I'm sure) story shows why Open Source is so great. If it was just me trying to figure this out on my own, I would have gotten as far as "My cookies are colliding and I'm not sure why". And I would have created an ugly workaround that would have been difficult to maintain.

But because there were about 10 people working on it, we all pitched in a bit of effort and created a sollution that actually fixed the problem for the longterm.

Yet another reason why proprietary software isn't a good solution for running a website.

CiviMail

Yes it's true, CiviMail can be run on shared hosting! This enables small-to-medium sized organizations to run full scale massmailings and effectively target constituents and donors.

IE6

A few days ago a client let me know that their site was having some minor issues on their home computer running Internet Explorer 6. This kind of problem is normally really easy to deal with: I just load up the site on the browser in question, and figure out a solution that doesn't negatively impact other browsers.

However Microsoft has decided that IE6 is no longer secure (No really? The rest of us in the tech industry figured that out 5 years ago), and has automatically pushed IE7 to users' computers via Windows Update.

So anyone with Windows Update enabled has been upgraded (And naturally all of our Windows machines have Windows Update enabled). But according to various statistics accross the web about half of all internet users are still on IE6. So why is this? I have a few theories.

  1. Computers began shipping with Windows Update about 4(ish?) years ago. Older machines owned by non-savy users will still be running IE6.
  2. Browser stats lie. One of the big factors for IE is that many crawlers and bots will identify themselves as IE so that they don't get turned away. How much of all internet traffic spambots account for nobody knows, but it's got to be a lot.
  3. Corporate IT departments are waiting to see how IE7 performs in the wild before upgrading their fleets of machines. And with good reason. IE is showing a few bugs yet that Microsoft will hopefully deal with in a maintanance release (cross your fingers).

So there's still a lot of people running IE6. I've got an old laptop in a cupboard that I'll need to put IE6 on. And I better get my clients to upgrade to...

...something more secure than IE; Like Firefox.

Lots Happening

Wow it's been a busy last couple of weeks!

Today we launched a new site for The C.O.R.E Foundation (Canadians Organized for Relief Effort). There's some touchups still to come, but it's 98% there. There's the standard assortment of modules, plus I've setup the SimpleNews module to handle small-scale mailings. This is my first experience with this module but I immediately felt that it was more solid then some of the other small-scale sollutions that I've tried. We'll see how it goes.

Also there's a few bi-elections comming up that the Green Party of Alberta is gearing up for. We rolled out a lot of new features for them including a new donation/membership system, a new cleaner front-page, a store for selling Green Party hoodies & Coffee mugs. And we're working on more including tools for phone-banking and canvasing, and we're moving them up to the heavy-weight massmailing tool CiviMail.

The fun just doesn't stop. But I am tired. It's a good thing that Easter is here and we'll be taking the next 4 days off.

OSCMS Summit

So today is the beginning of OSCMS Summit 2007 (Open-Source Content Management Systems) hosted by Yahoo in Sunnyvale California. Unfortunately I'm not there. The conference sold out a mere days after registration opened; faster than I could contemplate what all I'd need to arrange in my life for me to go.

And so I'm trying to get as much info from afar as possible. Reading Blogs, Powerpoint presentations, podcasts, and hopefully some video when that becomes available. But it's just not quite the same. I can't talk to people face to face, I can't feel the energy in the room. Kathy Sierra comments on this phenomenon in a recent blog post: Face-to-Face Trumps Twitter, Blogs, Podcasts, Video...

fgrep search

Recently Drupal uber-guru John VanDyk wrote a blog post about searching code with grep.  He has a great tip about creating a small bash script to use as a shortcut. 

As an alternative to grep you can use fgrep.  fgrep is the same as grep except that it doesn't support regex, hence it's much faster (hence the f).  98% of the time you don't need regex.  

fgrep is noticably faster, expecially when you are searching the entire codebase of a site that has many modules with many files (like CiviCRM or TinyMCE). 

api.drupal.org

So I was doing some development today for a new client's site. I was trying some new things and working with the Drupal API. Drupal has a good reference site for the API at api.drupal.org. I was working with a theme function. With the theme functions you simply copy the original function into your theme and then make whatever changes you want. It's part of what makes Drupal so easy to customize.

However when I pasted the function into my theme I started getting a bunch of errors. After a bit of hunting around I discovered that the code on api.drupal.org was not quite current. After finding the current version of the function, everything was ok.

I'm not sure how often the code on api.drupal.org gets updated, but apparently it's not enough. It's obvious that the main module that runs the site needs some updating as 5.0 is not even listed (You need to search under the HEAD branch). Does that mean that the code content hasn't been updated since before the last branch? That can't be right.

I wonder who to talk to about this?

Why we use Drupal

There's an interesting article on Collaboration Loop that talks about Drupal (The application that we use to run our websites) and why it works great for helping online collaboration.

There is probably nothing in Drupal that products from the big vendors can’t do and may have implemented somewhere. The difference is companies using Drupal are meeting customer needs faster and cheaper because they are sharing innovations within the community. This is resulting in a growing community that is increasing the pace in which new innovations are brought to market.
-Larry Cannell

What is "Web 2.0"?

The term "Web 2.0" gets thrown around a lot these days. From my vantage point it was originally used to describe how the internet is entering a new paradigm. One that moves away from closed silos that hoard information into a world where information is shared and communities are built through interaction. Then the term expanded to include the technologies that aid online community; Things like RSS and AJAX. And now "Web 2.0" is used to describe whatever is new, whatever is cool. The term has almost completely lost its meaning.

But that original definition, the one about creating communities and online interaction, is what CommunIT.ca is all about. We help nonprofits and progressive businesses develop communities and engage people.

For a detailed history of "Web 2.0" take a look at this short movie courtesy of Jeff Utecht.