Enterprise headlines and summaries, 2009-10-06

  • Pentagon: Our cloud is better than Google’s
    DISA has been operating RACE since Oct.1, 2008; since then, hundreds of military applications including command and control systems, convoy control systems, and satellite programs have been developed and tested on its user-provisioned virtual servers. DISA says it has cut the acquisition time for a new server from six months to 24 hours with RACE.
  • IBM, Google, and colleges talk cloud projects
    In presentations at the Computer History Museum in Mountain View, academics are hailing projects like “Scaling the Sky with MapReduce/Hadoop” at the University of Washington, “Commodity Computing in Genomics Research” at the University of Maryland, and “Dynamic Provisioning of Data Intensive Applications” at the University of California, San Diego. The meeting is for CluE Principal Investigators. IBM and Google presentations also are featured at Monday’s event.
  • Slides & Thoughts from Hadoop World NYC
    Here are a few resources mentioned in the talk: * Trendingtopics code on github * Wikipedia Page Traffic Statistics Dataset * EMR Forum discussion about using R with Hadoop (scroll down for R code that runs on Twitter data) * David Rosenberg’s R Streaming package on CRAN * How FlightCaster Squeezes Predictions from Flight Data
  • Hadoop World 2009 – some notes from application session
    Research Lab Setup – VM System: Custom Analytic Stacks Encryption Processing Relational Database – Hadoop Systems Management Stack Hadoop #1 ~40TB / 42 nodes (2 years of raw transaction data) Hadoop #2 ~300TB / 28 nodes
  • Hadoop World, NYC 2009
    Hadoop usage is just exploding. I mean, to some degree, sure, no duh, but, still it was pretty impressive to see how many people, in how many different ways, are building cool stuff on top of Hadoop. It had the feel of being early on in the curve of exponential growth — something like I would imagine Linux users felt in 1995. There were, I think, 500 registered participants at the conference — I’m guessing that next year it will be 1000.
  • Hadoop World impressions
    Most of the talks dealt with abstractions above Hadoop; of these, the most prominent — I believe everyone mentioned them at one time or another — were Hive and Pig. Both are SQLish languages that optimize their queries into map and reduce jobs. eBay had its own variant on this, whose name I didn’t write down; it was a special language meant to speed experiments on their recommendation engine. Someone loses an auction for, say, a 1998 Volkswagen; what’s the best thing to suggest that they buy instead? They conduct thousands of these experiments per day, and they need a language to efficiently encode consumer-behavior patterns. Hadoop appears in the backend, but eBay and most of the other speakers quickly leave it behind. It’s a testament to the technology’s maturity that it has become something like electrical wiring: largely unnoticed, and there to serve the real action a couple layers up.
  • Hadoop World NYC
    Hadoop is Changing Things I heard the phrase “an order of magnitude improvement in speed” so many times that I lost count. Speaking from personal experience, the difference you see in productivity between waiting minutes and hours for results and waiting days is immense. When you can see the answer to a question shortly after you ask it you can preserve the context you need to act on that answer immediately without having to spend the time to figure out why you were asking that question in the first place.
  • Post Hadoop World thoughts
    * Dynamic, data-rich social networks exceed memory limits and require considerable storage * MapReduce convenient for parallelizing individual node/edge-level calculations * Higher-order calculations more difficult when network exceeds memory constraints, but can be adapted to MapReduce framework These ideas are relevant for analyzing biological networks and relationships and hopefully once we are done taking care of some of the core needs (alignments, assembly, etc) we can start applying Hadoop and other distributed computing frameworks to solve more downstream problems.
  • IBM readies Exadata killer
    The machine, which is apparently going to be called DB2 Pure Scale, is obviously meant to blunt the attack of the Exadata 2 box cluster that Oracle and soon-to-be acquisition Sun Microsystems launched in mid-September.
  • Symantec to launch object-based file storage service in next year
    Symantec Corp. unveiled a new application called FileStore that’s designed to help companies build internal highly scalable, high-performance file-based cloud storage systems using commodity server hardware and the arrays of most storage vendors.
  • Eolas files patent lawsuit against 22 companies
    for allowing embedded applications in Web browsers, and the second a continuation of the first patent, allowing Web sites to add embedded applications through the use of plug-ins and AJAX (asynchronous JavaScript
    and XML).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: