For those of you that don’t know me, I’m pretty much a PHP guy. Have been for quite some time now, about 10 years. I’d like to think that I know most of what PHP has to offer, and am able to pretty much make it do anything I like. However, there is an itch. The things that attracted me to the language most when I started out, like loose typing and no need to compile the code, are now the things that irritate me. I have to be fair here, it doesn’t irritate me on a daily basis, but I do find myself going “Oh, here we go again…” a few times a week. And of course there is speed. Even though PHP has come a long way, it’s still really slow compared to other languages, especially the ones that are compiled (yes, we’ve got HipHop now, I know, but from what I read from the benchmarks even with HipHop we’re still a long way away from other languages).
So anyway, for my job I was working with a set of large arrays (about ~250,000 entries each) that I had to slice a few million times over (~ O(n^2)) for some calculations on those slices (what those calculations are is not the point of this post -and kinda proprietary- so I won’t go in to it). I got this working in PHP easily, but it was incredibly slow. I just waited it out, got some useful results, and worked them, and didn’t really give it any more thought.
Then about a week ago I had to the run the script again (on different datasets), and got annoyed again with how slow it was. Between the last time and this time I ran the code I had been looking at Google Go; followed the tour (which I can highly recommend!), read some articles, wrote a distributed merge sort to see how go routines work, that kind of stuff. So I got to wondering, what if I rewrite this slow PHP script in Go? I had figured that, since Go is a compiled language, naturally it would be faster than PHP, but I didn’t know how by how much. So I got to work and basically converted my PHP script to Go. I’m sure there are things that could be done better/ more elegantly, but this was just a hands on test for me to see how fast it would go, and it suffices for that purpose. The conversion from PHP to Go was pretty painless by the way. It turns out their syntaxes aren’t all that different from each other. Which isn’t really surprising given that they both have their roots in C I suppose.
So I finished my script in Go, compiled it, ran it, and… my mouth literally fell open from surprise. This can’t be happening I thought. Not only was it faster, it was WAY faster. By the time PHP had run through the first 1,000 data points, Go was done. And I hadn’t even used go routines yet (Go’s lightweight concurrency mechanism), this was all in a single thread. The PHP script had been running from Friday evening (about 7PM) and was still running on Monday morning (about 10AM), and still had a long way to go. The Go script runs in about 20 minutes to finish the whole thing. I had to double check the data to make sure both scripts actually did the same thing, but the results where exactly the same, so they did in fact do the same thing. I estimate the PHP script would have run until at least Tuesday evening if I hadn’t killed it because I no longer had any use for it.
I did find out a few days later that the main problem in my PHP was using array_slice, which actually makes copies of the array instead of using pointers (which Go does), and is slow. When I rewrote that to just telling PHP to work on the main array from a certain offset up to another offset, the PHP script performed a lot better than it did before, bit still got beaten by the Go script hard. Go internally uses slices instead of array most of the time. A slice is basically a subset of an array implemented by a pointer to the first element of the slice, plus the number of elements in the slice. So it acts as a proxy object to the real array. The obvious advantage of this is that slices are super fast to create since there is no copying involved; it’s just a pointer and in int. More info on slices can be found in the official Go blog at http://blog.golang.org/slices, which is an interesting read, and a brilliant approach to arrays.
The thing I enjoyed most while writing in Go was that the compiler will actually stop if you’re doing things that aren’t allowed! We’ve had PHP code in production that suddenly started giving errors in pieces of code that had never run in a few months and it turned out it was trying to instantiate a class that hadn’t been used. You really don’t want to find that kind of stuff out in production, you want to know beforehand. I realise we could have covered this in unit tests, but really, unit tests are for business logic, not to test if the language is doing it’s job properly.
Having fully working type hinting is also a bless. You can just look at a function and see what it will return. As opposed to PHP where a function can return anything it likes (like an array if something went well, null if there was nothing to be done, and false if there was an error…). This also enables the Go compiler to stop you if you want to assign the return value of a function that returns a string to an int. Very handy.
I also like how the language won’t let you define anything that you don’t use. If you create a new variable, but never use it, it won’t compile until you remove that variable. The same goes for using packages. This keeps your code very clean, since everything that’s in there actually has a purpose, and isn’t some stray line that was overlooked while refactoring for example.
Over the next few weeks I will be looking at Go more and implement some parts of our system at work that could use a speed boost. I’m looking forward to it!