San Francisco was great! I got to take part in a great conference, network with other people facing similar challenges and catch up with some of my old friends in the area.

It’s amazing how many people are doing “startups” in that area. I put startups in quotes though because a startup isn’t anything less than a great idea that hasn’t been fully executed yet. Second, I should clarify “amazing” too. Obviously Silicon Valley is the mecca for high-tech startups in the world, but you don’t actually realize what that means until you start to talk to people there. Here in San Diego I can count the number of people I know working on startups on one hand. Ok, maybe 2 hands if you count people wanting to get involved, but haven’t made that first step yet.

While I was catching up with Sameer and Shirin over dinner they were telling me how the majority of their friends (aside from working at Google) work in startups.

That said, here’s a summary of what I got from the Velocity conference. I had a great time and Steve Souders and the O’Reilly team really did a great job putting this together. My only suggestion for next time would be to show some more real examples. For example, I know that in order to scale you need to shard, partition and replicate the databases. You need to profile and test, then profile and test again. Maybe I was looking for more of a workshop, but it was very informative and I hope you return next year!

Here’s what I got…

Performance Overview and Tools

  • Plan for perf from day 1
  • Set expectations ahead of time
    • why do users of World of Warcraft accept downtime each week (and pay for the service), but users of Friendster/twitter don’t? Expectations.
  • Even after you solved YOUR performance problems, ads become the root of the problem!
    • Artur has a cool hack to overwrite document.write to get around this problem
  • Measure the right things
    • No cache scenario is really important
  • When optimizing perf you can’t just look at the mean average, you have to look at the median, percentile, and outliers too.
  • Automated testing and profiling needs to be part of the process!
  • keep historical records of how features perform
  • consider keeping some (small amount of) profiling code in production
  • Tools
    • Kite
      • Real-time testing from multiple locations
      • Interactively test perf from desktop, last mile, and cloud
      • available in Aug
    • Jiffy
      • measure individual pieces of page rendering (script load, AJAX execution, page load, etc.) on every client
      • report those measurements and other metadata to a web server
      • aggregate web server logs into a database
      • generate reports
      • has Firebug plugin
    • Doloto
      • analyzes application workloads and automatically performs code splitting
      • code gets processed and only necessary initialization code gets transfers right away
      • the rest of the code is replaced by short stubs and transferred lazily in the background
    • CloudStatus
      • Provide service to monitor uptime and availability for AWS, Google appengine, etc
    • Fiddler (profiling tool)
      • Written by my old mentor from Microsoft!
      • measure request size, page wgt
      • analyze caching, compression, page composition
      • simulate low-speed/high-latency connections
      • breakpoint debugging
      • traffic modification
      • perf test walkthrough
    • AOL PageTest

Optimizing Page Load Times

  • Apache changes
    • use gzip, far future expires and etag
    • Don’t use advanced compression (run time cost)
  • Javascript changes
    • reduce DOM elements
    • lazy load components
    • use GET unless you need POST
    • use JSON instead of XML (don’t want to walk the XML tree!)
    • minify with YUI compressor + HTTP compression
    • use jQuery
  • Content changes
    • minify JS+CSS as part of build process
    • render most of content on server side (instead of using JS+DOM for example)
    • fewer serialized actions
    • eliminated DNS lookup
    • move your contents closer to your customers (use a CDN)
  • Trim cookies
    • moving them away from your root domain and root path “/” bc those will be sent with EVERY page request
    • compress cookies (heads) not just bodies
    • grouping multiple smaller files into fewer bigger ones (image clustering)
    • eliminating them by moving your static content to a different domain
    • trim down the # of requests and redirects (round trips)
  • Work on improving perceived performance
    • cheat when you can by updating UI then do the work
  • Unique tricks
    • After page load, download external CSS and JS for upcoming page views
  • See my notes on Julien Lecomte’s High Performance Ajax for more great tips

Optimizing Images

  • Choose PNG over GIF
    • avg savings of 20%
  • Crushing PNGs
    • most image programs DO NOT optimize
    • cmdline tools:
    • avg savings of 16%
  • strip needless JPEG metadata
  • reducing their color palettes
    • convert truecolor PNG to palette PNG (PNG8)
    • PNG8 has 256 colors, truecolor has millions
  • far future expires (harder than it seems)
  • combine images
    • use sprites

Scaling

  • Use Memcached

    • high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Handles the majority of your reads. It’s practically a requirement, but not a substitute for bad design!
  • Use a CDN
    • Panther - cheap and good (recommended by someone from stardoll.com)
    • Akamai - expensive, but top of the line
  • Increase cache hits (with edge caching)
    • doesn’t like squid
    • loves varnish
      • a bit unstable
      • segfaults under load (running trunk)
  • Database scaling options
    • Vertical partitioning - splits table columns into separate tables or even splitting of tables into separate databases. You can use a view to read the data again, but can’t insert/delete from a view. Speaker was against this approach.
    • Replication
      • most apps are read intensive, which is handled by memcached
      • does not help to scale writes (bc all slaves in a master-slave setup must get written)
    • Sharding
      • “only” solution for large scale apps
      • can be hard to implement if not designed for…needs planning
      • want most queries to be run on same shard
        • good: sharding blogs by user_id
        • bad: sharding by country_id (large portion can come from same country)
      • techniques
        • fixed hash sharding
  • DB scaling tips
    • if you have problems bc foreign key obj, look if db offers cluster indexes. will cut down on disk seeks.
    • app should be unaware of partition strategy. app should ask where a user lives and go get it
    • you start with a normalized db, but you need to denormalize when you start scaling
    • have triggers that hit processing nodes to figure out how to build a new row to add to db as denormalized data
    • if you are a brand new startup don’t worry about partitioning, sharding
    • better to launch to all then have a private beta and hope those users are still interested in 6 months
    • good to think about how you could use sharding down the road
          • even IDs on A, odd on B
        • data dictionary
          • user 25 has data on server D
          • dict can become bottleneck
        • mixed hashing
      • HiveDB
      • HSCALE

Operations

  • Think about how your architecture will scale when design it, but don’t shard or partition until you have to. You’re bottlenecks might not be what you think.
    • LinkedIn underestimated the number of events/messages that would occur in the system. To scale they had to denormalize and duplicate that event for each user it was sent to.
    • SmugMug didn’t realize they would get 1000s of comments on a single photo, so they needed to re-architecture how they handle comments.
    • SmugMug was logging all people that were viewing the photos, but after a huge traffic burst the site crumbled. They had to adjust how they were logging (I didn’t pick up how).
  • The Cloud
    • There needs to be an MVC framework for the backend, much like the MVC frameworks that exist today for the frontend (RoR, CakePHP, Pylons)
    • Data centers are unsustainable in their current form. Energy costs will double in next 5 yrs. We need Active power management. Basically, when you leave the room, turn off the light. Which means power down servers that you don’t need. Virtualize.
    • You beat complexity by automating it
  • Cloud computing services
    • Eucalyptus (open-source)
      • open-source software infrastructure for implementing cloud computing on clusters
      • compatible with EC2
      • wants to be the common denominator
      • doesn’t focus on the scalability

Tuning Recommendations

  • Memcached
    • network
      • ensure network processing is distributed across CPUs
      • bind memcached to CPUs not processing interrupts
    • run memcached 1.2.5 with 4 threads (default)
    • use in 64-bit mode for a large cache size
    • run in multi-threaded mode
  • MySQL
    • 5.1 has several perf improvements over 5.0
    • joins over sub-queries
    • use limits
    • innodb
    • avoid too frequent FS flushes
      • innodb_flush_log_at_trx_commit = 2
    • separate read/write databases
    • avoids trashing query cache
  • Apache
    • network stack: tune TCP time-wait if handling lots of conn
    • do not load modules that you do not need (in httpd.conf)
    • tune ListenBacklog(8192), ServerLimit(2048), MaxClients(2048)
  • PHP
    • turn off safe_mode if you don’t need it
      • safe_mode = off
    • increase realpath_cache_size if you have lots of files
      • realpath_cache_size = 128K
    • use xcache (stability problems? … participant in audience said he uses it in large scale and works great) or APC

Miscellaneous

I was at the IE8 session and they spoke about a few new object enhancements.

  • XDomainRequest - cross domain communication w/o server-side proxy
  • Improved XMLHTTPRequest obj now with timeout attribute