Quick-to-market Web 2.0 Development Consulting
Launch21 is a Seattle-based technology consulting group specializing in quick-to-market development of next-generation Web sites. Our partners have pioneered the development of the web, AJAX, Web 2.0 and WPF technologies and bring a unique depth of experience in these areas. While typical web-development projects take months to get off the ground, our combination of attitude, experience and Web 2.0 code libraries enable us to get your business off the ground in as little as 21 days.
March 24th, 2008
I’ve been working on a project lately where we are trying to design a site to handle some really high traffic volumes. As an interactive data-driven site, the database is going to be the scalability limit- the site is designed so its easy to add more front-end web machines behind load balancers, but scaling-out your database (via replication and partitioning techniques) can be much more complicated. So the question at hand is how much load can MySql be expected to handle on commodity hardware. I was a bit surprised to not find good resources for this information on the Internet so we did some analysis ourselves.
First of all, I should define commodity hardware. Today you can buy an incredible amount of computing power for really reasonable prices. You can get a 1U machine with dual-Xeon 5430 CPUs packing 8 cores at 2.66ghz each and 4 SAS drives on a nice RAID controller for under $3000. It is possible to get a machine with more CPU (quad Xeons, etc) but as you move up from here, the prices go up dramatically- jump to 4x quad-core Xeons and you will quickly be spending at least $10,000.
The first round of tests are very simple and are just aimed at asking two very simple questions-
What is the maximum number of queries per second I can expect from MySql on this hardware under ideal conditions?
Does MySql 5.1 scale better than MySql 5.0 on this 8-core system? I’d read various rumors that MySql 5.0 has a hard time fully using 8-cores, but MySql 5.1 hasn’t had a final “general availability” release yet, so I’d be reluctant to use it unless it is a lot faster.
The test is just a tight loop of a PHP file doing 100,000 queries for random user-records. 1-64 of these PHP scripts are run at the same time to test more clients hitting the server. This test is unrealistic in many ways- first of all there are no writes, so the MySql query cache should never get cleared on this test. Because the data-set is fairly small this test should involve NO disk I/O- the entire database table will be cached in RAM and so this is a pure test of the maximum number of queries.
There were a few challenges running this test. First of all we saw some really strangely poor performance at first with the MySql we had installed via a RPM. Re-compiling MySql to make sure it matched with our exact kernel and processor solved that problem.
There was another test that was being limited by network bandwidth- I had thought the client machine was on gigabit Ethernet, but it turned out its port was only 100mbps and we were filling that entire pipe. I found the issue by comparing the performance from the remote machine to running the test-client on the MySql box.
Results
With MySql 5.0 with one thread we were seeing about 7,000 qps (queries per second). Adding a second thread doubled that, but it maxed out at around 10 threads and 40,000 qps. The total CPU was about 540% (or 68% of the total CPU)- it was using more than 4 cores worth of CPU but wasn’t able to really use the full 8.
With MySql 5.1 on thread gave us the same 7,00 qps, and 16 threads gave us about the same 40,000 as MySql 5.0, but as the client-threads increased to around 30 it maxed out at 55,000qps. At this point it was using about 750% CPU (or 93% of the total) on the machine. You really aren’t going to see more than this other than a full compute-bound task (like numerical calculation, encoding, etc). So the conclusion is that MySql 5.1 DOES scale better on an 8 core machine, giving around 30% better performance.
Overall this is showing really good things for MySql performance and especially future expectations. I wouldn’t be eager to jump and go deploy MySql 5.1 yet considering that it isn’t “final” yet, but it should be a good choice when its finished. Also over the next year Intel is expected to upgrade the Xeon to 6-cores per chip and re-introduce hyper-threading giving each core 2 “virtual” threads worth of execution. Together these should give us another 100% performance boost given the right software.
There are a couple of next steps that I’m hoping to explore in future posts. It would be interesting to get a better picture of MySql performance with a mixed load that contains some insert/updates in addition to the queries. I’m also planning on doing a similar performance analysis of memcache. Finally I’m planning on writing a bit about how to use these numbers in creating a capacity plan for your web-site.
Posted in Scalability | 2 Comments »
March 17th, 2008
As an offshoot on another thread someone asked for some feedback on development with Rails. They pointed out that there are plenty of pro-Rails resources on the net but were curious to hear some of the other side of experience with Rails.
This post is an edited version of the private reply I sent. To be honest I initially wasn’t going to put this up in public at all for risk of the wrath of the fan-boy legion, but I thought it was worth sharing a few of these thoughts.
First of all, let me say that Rails isn’t bad, and many Rails developers are fine developers. One of the smartest guys in the world I know loves Rails for all the right reasons and uses it very effectively.
I’d classify my caution about starting a project in Rails in three buckets-
Immature
Rails has progressed a bunch since I worked with it, but its still very immature compared to the other environments. Any Rails deployment that has more than a minor load (needs more than a shared hosting environment or on the order of a few thousand visits a day) will typically require a bunch of time dealing with infrastructure issues. When you install just about any modern copy of linux you are pretty well setup to run PHP or other more mature environments out of the box. Sophisticated people can tweak some stuff, but you get apache, mod_php and you are ready to go.
For Rails you don’t hit this much during development since everything is running on your machine with their nice little single threaded local server. But the minute you have any real load (lets call that 5 people are hitting your site at once) it can be a problem. I’m sure since my experience these things have improved a lot, but I still hear all the time from people about the pain of dealing with Mongrel and other deployment issues. What other major web app runtime doesn’t have an apache module? There are some of the core-Apache folks who I would listen to if they told me that Mongrel was really the right way to build this, but I haven’t heard from them yet.
Furthermore lots of the infrastructure is still coming along. There are some debuggers now, but they are still very new and given how much Rails does for you under the covers (more on this later) it can be really hard to figure out what went wrong. What it comes down to is do you really want to invest your resources into running Rails or into running your business? There are some very successful bigger sites that are built on Rails but they tend to have a dedicated staff of Rails experts, many of whom contribute to the Rails codebase. If you don’t have that guy, you need to hire that guy (not the one who thinks Rails is cool and picked up a book on it 6 months ago, but the one who really gets the details of the architecture and deployment) or you should be investing somewhere else.
Abstraction
For me, Rails typically does not operate at my favorite level of abstraction. All computer programming is about working in different levels of abstraction (unless you are at Intel working on the inners of the CPU and even there most of those folk work in abstractions, rather than individually manipulate individual logic gates). Picking the right level is the key. Rails really pushes you to a certain level of abstraction that makes rapid development easy, but often you don’t realize what is really going on under the covers since it handles it all for you. It is really easy to accidentally write a loop that does a database query through every iteration, but you don’t realize that is what is happening since it just looks like safe object references. Its easy to have too nested loops each doing more queries and just trash your performance.
All these things can be fixed by careful tuning and after all you have all the sources to this stuff. But Rails sets people up to work at that other layer and I’ve rarely met Rails developers who actually built a log of every SQL query that their app did to tune their performance. Again, this can be invisible if you only have a few users hitting your site every minute or for demos, but falls over quickly (and can be lots of work to fix) later.
This issue isn’t unique to Rails. .NET users can fall into the same traps if they just use all the existing libraries although the standard libraries from Microsoft tend to be a bit careful about making this stuff explicit. Any environment can have a framework that causes this problem although Rail’s ability to do late binding on objects can make it even harder to see what is happening. For example, I’ve done some work lately with Drupal, a framework written in PHP and it typically suffers from similar issues.
What’s The Big Deal
My personal reaction to the RoR hype is “what’s the big deal?” A talented developer with the right architectural expertise can do almost equivalent rapid development of web apps in PHP, .NET, Python, Java or Rails. They all have great tools and if you take the right approach they can all be pretty equivalent. The advantage of Rails is that it “puts you on rails”. Often, web-developers working with PHP, .NET or Java can build themselves a mess of either hacky spaghetti code or over-architected, unnecessarily complex stuff. I’ve seen both. Rails is very proscriptive about how you build an app so if your app fits into their model (typical data-driven web site stuff), most developers will be more likely to get it almost right.
Beyond that there are other stylistic things. Personally, I prefer C-style syntaxes over Smalltalk-style, so I’m more comfortable in C#, PHP, or Java. Other people really love the Ruby language.
There are some actual differences in the runtimes that you can poke at, putting some Pros and Cons in various columns but they are mostly a wash for general web-apps. There are some specialized apps that require certain perf characteristics that would steer me towards .NET or Java (where you get compiled performance and the ability to keep objects in local RAM in-between requests). Certain other apps have different characteristics that would steer me towards PHP, but for most it doesn’t really matter.
Again, Rails isn’t bad and the last thing in the world I would want to do is imply that I don’t approve of someone’s skills because they prefer Rails (or even that I would avoid a Rails project, despite it not being my first choice). I’m not a fan of some of the hype/fan-boy attitude surrounding Rails and I would advise folks that work in Rails to invest the time into understanding the details of what goes on in their environment, especially with respect to how it impacts database performance.
Posted in Launch21, Tools | No Comments »
March 11th, 2008
There has been quite the kerfuffle on various blogs and mailing lists about various approaches to work. It appears to have mostly started off as an exchange between 37signals and Jason Calacanis, but it spread to various nasty exchanges on some mailing lists as well as other posts like this one from my friends at Jackson Fish.
At some point I made the following post on the Seattle Tech Startups mailing list (edited slightly)-
Not every startup is the same.
Not every person is the same.
Some people are not at all suited for any startup. They thrive with a certain stability only a big company will give. Not that you can necessarily be complacent at a big company, but there is a stability there that no startup will ever have.
Some people just love to work. They will thrive in a startup that has that culture.
Some people take a more balanced approach towards live and work. They probably bring useful things to the table beyond lots of hours.
What is important to realize is that every startup will have a culture. And people who don’t fit with it will probably be miserable. If your startup has a “go to coffee twice a day and every chat is over a long lunch” culture, the guy who just wants to sit and crank out code all day will be frustrated with their colleagues. And they will be frustrated with that guy (who doesn’t spend as much time getting in sync with everyone else).
At the same time, if you have a culture where people just crank on stuff all the time, the person who takes the different approach will not be appreciated very much either.
Go figure.
Some businesses will be a better fit for one or the other. Its probably hard to get a new release out every week or two on the more relaxed/thoughtful plan. If you are in a market that requires that (because the competition is going to do that and will beat you if you don’t) you had better figure out how to make that your company culture or else go for a different market.
Other markets have very different expectations. They won’t go for a quick to market but unpolished product and they don’t want new releases (and thus change and more training) all the time anyway.
Fit your culture to your business (or vice versa). Fit your people to your culture. Don’t try to force a fit where its not going to happen because its not going to work.
And be understanding that one size doesn’t fit all. Stop calling people slave-drivers because they enjoy the driven culture and build a business that fits that, and start calling other people lazy because they aren’t putting in 80 hour work weeks. Each has its place.
Posted in Launch21, Business | No Comments »
December 18th, 2007
I’m doing some Facebook developing using PHP at the moment. Overall it seems more smooth than the .NET stuff did, at least back 4 months ago when I was digging into it most. Part of the issue is that .NET provides a ton of different ways to build things- code behind, server controls, HTTP handlers, etc. So adding an infrastructure thing like Facebook based authentication can be taken in a bunch of different directions.
Of course this whole platform is still very young. They are just starting to come out with debugging tools and test environments and it can take a lot of time to keep up on it, not to mention that the actual APIs change quite a bit. I’m still fixing one of the first applications that I wrote (its not public yet) to update to the changes that removed the API for sending invites. Now you have to do it with one of their controls. Given the spam problem with apps just broadcasting invites, its a reasonable solution, but it can be a real pain keeping up with their platform changes.
Subscribing to their developer news feed (from http://developer.facebook.com/news.php?blog=1&format=xml ) is pretty crucial for anyone working in this space.
Meanwhile we are starting to investigate the OpenSocial stuff, although it still seems a bit early and unformed. I still have not heard from anyone who has used it in a real meaningful way, but I’m looking for a good chance to dig in.
Posted in Launch21, Facebook | No Comments »
August 16th, 2007
Two key tools for debugging things in Internet Explorer-
The IE Developer Toolbar
Fiddler HTTP web-proxy to let you see the real traffic. It even supports SSL.
Posted in Tools | No Comments »
August 13th, 2007
The Facebook developer documentation strongly encourages the use of FBML over IFrame for the outer application UI. They suggest that the FBML approach will be easier to build and have better performance.
So far all the Facebook development I have done has been FBML, and while I’ll agree that its fairly nice, it has some key drawbacks. First of all, if you have an existing site, you need to restructure it all in FBML. FBML is pretty close to HTML, but its hard to make any use of code-sharing between your “normal” site and the Facebook version (side note- sometimes this might be a good idea since often what works for Facebook will be different from what works with normal sites, but its still more work).
Also because you can’t use any script in the page, your ability to add javascript-based interactions to your page is very limited. There are some simple built-in Ajax functions, but they are simple enough that you often hit limitations.
On the other hand I don’t get the supposed advantages of FBML, especially the claim that it will result in better performance. How does routing all content from my server, through the Facebook ones, to be re-parsed actually improve performance? Sure, its one request vs. two, but my general sense is that Facebook is making tons of requests for the most simple things. Try hitting the refresh button sometime.
The first thing I did with an IFrame (using the IFrame embedded inside FBML approach) is to create a refreshing page. I created an ASPX page with the code-
Refresh.aspx-
<script>
window.setTimeout("top.location = '<%=Location %>';", <%=Timeout %>000);
</script>
Refresh.aspx.cs- (partial)
protected void Page_Load(object sender, EventArgs e)
{
Location = Request.UrlReferrer.ToString();
Timeout = Request.QueryString["r"];
}
The parent FBML page can then include this somewhere at the bottom to cause itself to refresh in RefreshRate seconds.
<fb:iframe frameborder="0" style="width:0px;height:0px;" src="http://www.fastcarrot.com/sinkmyships/refresh/<%=RefreshRate %>" />
Be careful when you do this to manage your refresh rate so you don’t destroy your servers. Its really easy for people to accidentally leave your page up in a tab unseen and if enough people do that your servers can be brought to their knees. My current approach is to have the server control the refresh rate so that it refreshes a couple of times quickly if you are interactive with the page and if you ignore it for a bit it slows down considerably. I suppose if I were smart I’d also keep a metric on the overall load on my server and automatically back off if the server ever gets pounded. For now I’m assuming that any growth curve (fingers crossed) won’t be so fast that I can’t go make a quick manual configuration update to have the refresh back off before it gets out of control.
Posted in Facebook | 5 Comments »
August 9th, 2007
I’ve been developing some applications for Facebook lately. I got past the first couple of blocks fairly quickly and now have an application mostly working.
Of course there are always more issues. I expect many of these issues are typical for anything that is delivered as a service developer platform. The Amazon developer platforms are more back-end technology so its a bit easier for them to tackle some of these things.
The biggest issue so far is in general development process and staging. Since my application integrates with the Facebook UI and social features, its not entirely clear how to best manage those aspects. I usually maintain two copies of my web-sites, one that is the active public one and another private copy that I’m using for development and testing purposes. Actually there are usually 3- active development usually goes on using a local server on my workstation. The staging server typically doesn’t have anything that is especially secret, but it isn’t something that I’m worried about crashing/returning exceptions if something is not fully debugged.
With Facebook any site that working on needs to be something that can be accessed on the public Internet by their servers so the local developer-workstation test environment does not work at all.
So far I’ve managed without a seperate set of staging servers. Since my app is not yet open to the public, I don’t need to worry about breaking real users. But once I open the application up, I’ll need to deal with this. I assume I need to just register a second copy of the application with a different host header. The Facebook platform makes you use absolute URLs for everything which makes this a bit of a pain but I’m sure its something I can manage.
Since I can’t directly debug on my workstation as easily as normal (I could use remote debugging to the server but there are assorted reasons I don’t want to do that) I’ve had to rely on logging more for debugging. I’m using a “tail” utility for Windows that lets me display the log output in a window in real-time.
I’ve also signed up a second account to use as a test account to test the collaboration aspects. Of course this means I had to turn off the “developer mode” so the other account can access my application, and this in theory means that others could access the application. For now I just put in a server-side block so only those two accounts can get into it. Still, it wouldn’t be ideal if I really cared about stealth.
More on some techniques I’m using in the application itself soon. This platform is pretty new so there is only limited and scattered information about it around the web. I’ll try to collect some links here and post any tips I run across. I’m sure some of my suggestions will be wrong, but as usual I’m open to feedback and suggestions.
Posted in Launch21, Facebook, Services | No Comments »
July 10th, 2007
Messing around with SilverLight 1.1 last night I made a MathTutor demo. No big deal but it let me mess with the new builds, c# code-behind, events, animation, etc.
The scoreboard updates on a 1 second interval. All the updates happen from the c# code, but at the moment I’m using the Javascript setTimeout mechanism to trigger it since it wasn’t clear if the limited .NET supported by Silverlight had some good thread-safe timer mechanism.
Posted in Silverlight | No Comments »
July 6th, 2007
I’ve been messing around looking at what is involved with developing for Silverlight 1.1. My impression of the documentation available for the 1.1 version so far is that its pretty bad- but again, let me point out that I think its great that they shipped it even at this early stage. I’d much rather have the bits available with confusing docs than not at all.
One thing that took me a bit to figure out is that there are better docs on silverlight.net than there are in the Silverlight developer center on MSDN. The MSDN pages just have access to the downloads and some really high level overviews. The silverlight.net site at least has the quickstarts available in online form with easy reference to the source code. Playing with this a bit got me past the basics of getting one of the demos compiling myself and running.
The second issue is the version of Visual Studio needed. You need to run Orcas beta 1 to get the SDK templates to install and for debugging support. Visual Studio 2005 can work fine to build projects as long as you can do a few things yourself and don’t need the debugger. I think Orcas is supposed to be able to install side-by-side with Visual Studio 2005 but I don’t totally trust that. Luckly Microsoft is offering Orcas as a Virtual PC image which should be a great way to go.
If you want to build stuff yourself with Visual Studio 2005 just create a new DLL project. Delete the default control it makes for you and remove all the references. Select Add References and go to your \Program Files\Microsoft Silverlight and add all the managed DLLs you see there to your project. You probably don’t need them all but I didn’t want to mess around.
You can then create your project as a set of .CS, .XAML, .HTM and .JS files. Don’t forget to set the “Copy to output directory” property on your .XAML .HTM and .JS files so they appear in the target directory. In your XAML you can refer to your DLL like this-
x:Class="Samples.Silverlight.CS.ScriptingCanvas;assembly=TestSilver1.dll"
Finally set the project to debug opening in the browser with the URL to your .HTM file. IE will give you a security warning everytime but once you bypass that you should be running.
Posted in Silverlight | 1 Comment »
June 14th, 2007
Yet another CouponLooker update went live today. The most noticeable additions are recommended sub-queries for % off coupons and specific merchants. For the specific merchants it was necessary to add a mechanism to normalize the merchant names, since the coupon feeds we get sometimes list Dell as Dell.Com, Dell Home, Dell USA, or about 40 other variations.
The other bigger project was mostly not visible. We are experimenting with sponsored coupons, which would allow the provider of a coupon feed to pay us to get a higher listing for their coupon. This is tricky since our top priority is to maintain the quality of the results we give users- just because someone is paying doesn’t mean its ok to show some crap unrelated coupon. So the system is built around some fairly sophisticated scoring based on the initial score, click through rates (if a coupon is relevant, people will click it, if its less interesting it gets ignored) and more.
The system we use bears some resemblance to some of the similar stuff from Google. Its interesting observing the dynamic in discussing emulating Google’s approach to something. Google has been such an extreme success story that its a bit tempting to take an initial position that you can just emulate them as the path to sure riches.
Back in reality its important to break it down into two pieces. First of all, the amazing revenue that Google enjoys is not just a product of their technology, rather its supported by their technology. Emulating the technology doesn’t directly give you any revenue. You need a good business plan and model, and I’ll leave that to others (from my seat its in good hands at Judy’s Book).
On the technical side it doesn’t just work to copy either. Instead its important to really understand the “why” behind the mechanisms that work and understand how you can apply that same learning to your different problem. So for example, we use a system that supports bidding for ad placement. We measure click-through rates to adjust the value of the ads. But other items are very different- Google ads are separate from the content, whereas in CouponLooker, someone is sponsoring a specific coupon that is part of our existing content.
Adding the sponsored coupons also introduced some new technical issues to deal with. Until now the usage of the system needed little back and forth with the database. In effect the engine could load all the coupons, and then search them from there with little changes until new coupons are loaded. With the new system there are much more frequent changes that an advertiser can create by placing new ads, running out of their daily budget or other factors which can change the results over shorter time-frames. I feel like the in-memory approach to the search engine really paid off here since it gave me lots of flexibility to change the details of computing the results. If I had relied much more on pre-computed indicies and results it would have been much harder to make this stuff work.
Which isn’t to say that it won’t be possible to pre-computer stuff later, but just that if you do that before you really have the details of the system worked out, you will have set in stone mechanisms that are harder to adjust.
Posted in Launch21, Search | 1 Comment »
|