Create Apps; Win Prizes!

Elsevier/TWC Health & Life Sciences HackathonThe Tetherless World Constellation at RPI is pleased to announce that TWC and the SciVerse team at Elsevier are planning a Health and Life Sciences-themed, 24-hour hackathon to be held 27-28 June 2011. The event is sponsored by Elsevier and held at Pat’s Barn, on the campus of the Rensselaer Technology Park.

After a short tutorial period by TWC RPI staff and distinguished guests, participants will compete with each other to develop Semantic Web mashups using linked data from TWC and other sources, web APIs from Elsevier SciVerse, and visualization and other resources from around the Web.

The contest will encompass building apps utilizing the SciVerse API and other resources in multiple categories, including Health and Life Sciences and Open classes. Overall, there will be three winners:

  • First place: $1500
  • Second place: $1000
  • Third place: $500

A distinguished panel of judges has assembled that includes domain experts, faculty and senior representatives from Elsevier:

  • Paolo Ciccarese (Scientist and Senior Software Engineer, Mass General Hospital; Faculty, Harvard Medical School)
  • Chris Baker (Research Chair, Innovatia)
  • Bob Powers (Semantics Engineer, Consultant at Predictive Medicine)
  • M. Scott Marshall (Department of Medical Statistics and Bioinformatics, Leiden University Medical Center)
  • Ora Lassila (Principal Technologist, Nokia; co-author of the W3C RDF specification)
  • Elizabeth Brooks (Head of Computing & IT, UHI, Scotland)
  • Hajo Oltmanns (Elsevier: SVP Health Sciences Strategy)
  • Scott Virkler (Elsevier: SVP e-Products Global Medical Research)
  • Helen Moran (Elsevier: VP Smart Content Strategy)

All attendees will be provided lunch, dinner, and midnight snack on 27 June and breakfast and lunch on 28 June.

Travel Assistance
A small amount of travel assistance will be made available for students and non-profits on a competitive basis. Please see our Travel Assistance page or contact us for further details.

Travel and Lodging Information
See the Elsevier/Tetherless World Health and Life Sciences Hackathon web site for specific information about transportation and lodging near the venue. Please note that the Hackathon runs for 24 hours, so it is unlikely that participants will want lodging on the night of 27 June…

Please browse to the Contacts area of the Elsevier/Tetherless World Health and Life Sciences Hackathon web site or follow the EventBright event organizer link if you have questions.

Follow us on Twitter!
The hash for this event is #TWCHack11

Posted by: John Erickson | May 31, 2011

Energizing Innovation Research through Linked Open Patent Data

Please note this is a DRAFT and may change throughout the day (1 June 2011)

On June 17 I will be joining other researchers at a Patent Data Workshop jointly hosted by the USPTO and NSF at the U.S. Patent & Trademark Office in Alexandria, VA. This workshop, supported by the USPTO Office of Chief Economist and the Science of Science and Innovation Policy Program (SciSIP) at the NSF, will bring researchers together to share their ideas on how to facilitate the more efficient use of patent and trademark data, and ultimately to improve both the quantity and caliber of innovation policy scholarship.

The stated goals of this workshop include:

  1. Creating an information exchange infrastructure for both the production and informed evaluation of transparent, high-quality research into innovation;
  2. Promoting an intellectual environment particularly hospitable to high-impact quantitative studies;
  3. Creating a distinct community with well-developed research norms and cumulative influence; and
  4. Championing the development of a platform to support a robust body of empirical research into the economic and social consequences of innovation.

Each participant planning to attend this workshop has been asked to prepare a blog post that outlines (a) our understanding of the most significant theoretical or empirical challenges in this space, and/or (b) where the frontier of knowledge is, what innovative things are being done at the frontier — or within reach of being done to solve the set of problems — and where targeted funding could yield the highest payoffs in getting to solutions. The purpose of this post is to offer some of my thoughts based on progress made by linked open government data initiatives in the US and around the world.

Background: The Tetherless World and Linked Open Government Data
Since early 2010 the Tetherless World Constellation (TWC) at Rensselaer Polytechnic Institute has collaborated with the White House team to make thousands of open government datasets more accessible for consumption by web-based applications and services, including mashups leveraging Semantic Web technologies. TWC has created an infrastructure, embodied by the TWC LOGD Portal, for automatically converting to RDF and enhancing government data published in tabular (e.g. CSV) format; publishing these converted datasets as downloadable “dump files” and through SPARQL endpoints; demonstrating highly effective methodologies for using such linked open government data assets as the basis for the agile creation of lightweight, powerful visualizations and other mashups. In addition to providing a searchable interface to thousands of converted datasets, the TWC LOGD Portal publishes a growing set of demos and tutorials for use by the LOGD community.

The LOGD partnership and similar international LOGD efforts, especially the UK’s initiative, have demonstrated the value and potential for innovation achieved by exposing government data using linked data principles. Indeed, the effective application of the linked data approach to a multitude of data sharing and integration challenges in commerce, industry and eScience has shown its promise as a basis for a more efficient, agile research information exchange infrastructure.

Recommendation: Create a “DBPedia” for Patent Data
The Linked Open Data Cloud diagram famously illustrates the growing number of providers of linked open data around the world. Careful examination of the LOD Cloud shows that most sources are sparsely linked, and a very few — most notably,, are extremely heavily linked. The reason is that the Web of Data has increasingly adopted DBPedia as a reliable source or hub for canonical entity URIs. This means that as providers put their datasets online, they enhance their datasets by providing sameAs links to DBPedia URIs for named entities within these datasets. This enables their datasets to be easily linked to other datasets and increases their utility and value as the basis for visualizations and linked data mashups.

Providers embrace DBPedia’s URI conventions as “canonical” in order to make their datasets more easily adopted. Our objective with patent and trademark reference data and research information in general must be to break down barriers to its widespread use, recognizing that we may have no idea how it may be used. Linked data principles and the Web of Data emerging from them have re-written what it means to make data integration easy. Whereas even a few short years ago it was useful to simply provide a searchable patent database through a proprietary UI, next-generation innovation infrastructures will be based on globally interlinked graphs drive by concept and descriptive metadata extracted from patent records, research publications, business publications and indeed data from social networks. Scholars of innovation will traverse these graphs and mash them with other graphs in ways we cannot anticipate, and thus make serendipitous discoveries about the process of innovation we cannot predict today.

My DBPedia reference comes from the idea of identifying concepts and specific manifestations of innovation in the patent corpus. Consider an arbitrary patent disclosure; it can be represented as a graph of concepts and related manifestations. The infrastructure I’m proposing will enable the interlinking of URI-named concepts, not only with other patent records but also scientific literature, the financial and news media, social networks, etc. From a research standpoint, this will enable the study of the emergence, spread and influence on innovation in many dimensions.

The USPTO has already made great strides in improving access to and understanding of patent and trademark data; an excellent example is the Data Visualization Center and specific data visualization tools such as the Patent Dashboard which provides graphic summaries of USPTO activities. These are “canned apps,” however; the next generation of open government will require finer grained access to this data, presented as enhanced linked data and using open licensing principles. As USPTO datasets are presented in this way, researchers will be able to interlink this data with datasets from other sources, resulting in a more effective study of the causes of innovation and indeed the outcomes of government programs intended to stimulate innovation.


  1. NSF Patent Data Workshop. NSF Award Abstract #1102468 (31 Jan 2011).
  2. Julia Lane, The Science of Science and Innovation Policy (SciSIP) Program at the US National Science Foundation. OST Bridges vol. 22 (July 2009)
Posted by: John Erickson | February 11, 2011

TWC LOGD Million Dataset Challenge

Posted by: John Erickson | January 19, 2011

“Falling down is part of LIFE…Getting back up is LIVING”

This inspirational quote is being re-posted around my networks today:

There comes a time in life, when you walk away from all the drama and people who create it. You surround yourself with people who make you laugh, forget the bad, and focus on the good. So, love the people who treat you right. Pray for the ones who don’t. Life is too short to be anything but happy. Falling down is part of LIFE…Getting back up is LIVING………Re-post if you agree; I just did

I’ve been trying to trace the origins of this meme using Google and focusing on the quote, Falling down is part of LIFE…Getting back up is LIVING; it seems to have been active on the Web for about a year, especially in the so-called “mommy blogs” and on Facebook.


Posted by: John Erickson | December 20, 2010

Fall 2010 TWC-RPI Undergraduate Research Summaries

The Fall 2010 semester marked the beginning of the Tetherless World Constellation’s undergraduate research program at Rensselaer Polytechnic Institute (RPI). Although TWC has enjoyed significant contributions from RPI undergrads since its inception, this term we stepped up our game by more “formally” incorporating a group of undergrads into TWC’s research programs, established regular meetings for the group, and with input from the students began outfitting their own space in RPI’s Winslow Building.

Patrick West, my fellow TWC undergrad research coordinator and I asked the students to blog about their work throughout the semester; with the end of term, we asked them to post summary descriptions of their work and their thoughts about the fledgling TWC undergrad research program itself. We’ve provided short summaries and links to those blogs below…

  • Cameron Helm began the term coming up to speed on SPARQL and RDF, experimented with several of the public TWC endpoints, and then worked with Philip on basic visualizations. He then slashed his way through the tutorials on TWC’s LOGD Portal, eventually creating impressive visualizations such as this earthquake map. Cameron is very interested in the subject of data visualization and looks to do more work in this area in the future.
  • After a short TWC learning period, Dan Souza began helping doctoral candidate Evan Patton create an Android version of the Mobile Wine Agent application, with all the amazing visualization and data integration required, including Twitter and Facebook integration. Mid-semester Dan also responded to the call to help with the crash” development of the Android/iPhone TalkTracker app, in time for ISWC 2010 in early November. Dan continues to work with Evan and others for early 2011 releases of Android, iPhone/iPad Touch and iPad versions of the Mobile Wine Agent.
  • David Molik reports that he learned web coding skills, ontology creation, server installation and administration. David contributed to the development and operation of a test site for the new, semantic web savvy website for the Biological and Chemical Oceanography Data Management Office BCO-DMO of the Woods Hole Oceanographic Institute.
  • Jay Chamberlin spent much of his time working on the OPeNDAP Project, an open source server to distribute scientific data that is stored in various formats. His involvement included everything from learning his way around the OPeNAP server, to working with infrastructure such as TWC’s LDAP services, to helping migrate documentation from the previous Wiki to the new Drupal site, to actually implementing required changes to the OPeNDAP code base.
  • Philip Ng worked on a wide variety of projects this fall, starting with basic visualizations, helping with ISWC applications, and including iPad development for the Mobile Wine Agent. Philip’s blog is fascinating to read as he works his way through the challenges of creating applications, including his multi-part series on implementing the social media features.
  • Alexei Bulazel began working with Dominic DiFranzo on a health-related mashup using datasets and is now working on a research paper with David on “human flesh search engine” techniques, a topic that top thinkers including Tetherless World Senior Constellation Professor Jim Hendler have explored in recent talks. Note: For more background on this phenomena, see e.g. China’s Cyberposse, NY Times (03 Mar 2010)

Many of these students will be continuing on with these or other projects at TWC in 2011; we also expect several new students to be joining the group. The entire team at the Tetherless World Constellation thanks them for their efforts and many important contributions this fall, and looks forward to being amazed by their continued great work in the coming year!

John S. Erickson, Ph.D.

Posted by: John Erickson | December 19, 2010

The TWC/Elsevier Dataset Search App

Since Summer 2010 I’ve had the privilege of working as a research engineer at the Tetherless World Constellation (TWC) at RPI, primarily helping the team in the execution of various projects related to their association with the Obama Administration’s initiative. One of those projects is an applet for the Elsevier SciVerse Hub portal. The following is from the description page for our application. Dataset Search (Profile View)

The US Government Dataset Search application is an easy way for SciVerse users and developers to search from among over 300,000 available US government datasets at to automatically find matches to their queries. Based on the user’s SciVerse Hub query, searches are simultaneously made against all datasets published through as well as the RDF-converted data and related demos at the Linking Open Government Data (LOGD) portal, created by the Tetherless World Constellation (TWC) at Rensselaer Polytechnnic Institute (RPI).

Any user with the ability to search SciVerse Hub can use the US Government Dataset Search application. The application and the government data it exposes are made available free of charge. The US Government Dataset Search application is targeted at both SciVerse end users (researchers) and application developers interested in applying government datasets to their applications. Researchers utilizing SciVerse Hub are able to discover and access contextually relevant data from the US Government. Developers may utilize SciVerse Hub to identify RDF-converted data sets based on the US Government data and access this data in their applications through SPARQL endpoints or retrieve the datasets themselves.

How the US Government Dataset Search application works: For each SciVerse query the user makes, a keyword search across all current datasets is made via a SPARQL endpoint at the TWC LOGD portal. A summary of these results is presented on the Hub search results page. Detailed results are presented in tabular form in the ‘Canvas’ (larger) view by clicking on any link. On the canvas view links are provided directly to the dataset description pages as well as RDF-converted versions of these datasets at the TWC LOGD portal. Note that faceted search is not available with the application and only the original query in Hub willbe submitted.

All queries are made against the LOGD SPARQL endpoint at The application also makes use of the Google Visualization toolkit.

This application is optimized for Firefox, Chrome and Internet Explorer 8.

For more information about creating mashups using datasets, please check out RPI’s Linking Open Goverment Data (LOGD) Portal at

About the TWC Linking Open Government Data project: The TWC LOGD team investigates opening and linking government data using Semantic Web technologies. TWC LOGD actively develops tools for the large-scale translation of government-related datasets into RDF, linking them into the ‘Web of Data’ and providing demos and tutorials on various means for consuming linked government data, including creating mashups, applications and data visualizations. The TWC LOGD Portal was awarded second place (open division) at the 2010 Semantic Web Challenge, held during the 2010 International Semantic Web ConferenceISWC2010.

About the Tetherless World Constellation at RPI: The Tetherless World Constellation addresses the emerging area of Web Science, focusing on the World Wide Web and its future use. Faculty in the constellation lead explorations into the principles that underlie the Web; enhance the Web’s reach beyond the desktop and laptop computer; and develop new technologies and languages that expand the capabilities of the Web. TWC researchers use powerful scientific and mathematical techniques from many disciplines to explore the modeling of the Web from network- and information- centric views. TWC’s objectives include making the next generation web natural to use while being responsive to the growing variety of policy and social needs, whether in the area of privacy, intellectual property, general compliance, or provenance. The Tetherless World Constellation is designing new techniques to explore social, scientific, and legal impacts of the evolving technologies deployed on the Web.

News about the TWC/Elsevier US Government Dataset Search Application

  1. Featured in Looking Back at 2010 at Rensselar RPI News & Events (20 Dec 2010)
  2. SciVerse Hub Application Connects Researchers with U.S. Government Datasets Information Today (20 Dec 2010)
  3. U.S. Government Dataset Search Opens to Scientists website (14 Dec 2010)
  4. New Application Allows Scientists Easy Access to Important Government Data RPI News & Events (10 Dec 2010)
  5. New Application Allows Scientists Easy Access to Important Government Data Lab Manager Magazine (13 Dec 2010)
  6. New Application Allows Scientists Easy Access to Important Government Data EurekAlert (10 Dec 2010)
  7. New Application Allows Scientists Easy Access to Important Government Data (10 Dec 2010)
  8. New Application Allows Scientists Easy Access to Important Government Data (10 Dec 2010)
  9. New Application Allows Scientists Easy Access to Important Government Data (10 Dec 2010)

UPDATE: I’m currently developing an iGoogle Gadget version of the SciVerse app, based on the same core queries. A screen shot of the “profile” view of that app appears below. In addition to enabling me to monitor the health of our systems from my desktop, it also enables me to test out possible features for the SciVerse app itself.

iGoogle Gadget version of the US Government Dataset Search app

Posted by: John Erickson | October 28, 2010

What I Want in a Software Developer(tm)

Professors and students in a nearby research group have been brainstorming a syllabus for a new, low-level computer science course. Normally I only “lurk” in such discussions, but this time I couldn’t hold my tongue. The following is my contribution, from my perspective as one who has interacted with “computer scientists” as a fellow team member, project leader, hiring manager, business partner and even corporate recruiter (interviewing mostly for other hiring managers).

This version has been edited slightly to make it better suited for a blog…

As an “old guy” who has interviewed his share of CS, CE and EE’s over the years (and hire and/or managed more than a few of them), here are my thoughts from an “outcomes” perspective…

  • It’s really exciting to work with a developer who groks the concepts to such a degree that specific languages and language boundaries simply don’t matter. Seeing a prototype done in Erlang because it was perfectly suited is SO much better than listening to whining over how it is hard to do it in Java or C# or Visual Basic N. They are usually curious about everything; the dude that coded a prototype NoSQL-style data store for our team in Erlang had been playing with it for a few months, “just because…”

  • Methodical problem solving matters. Which some would equate to Engineering(tm). But really it’s about gaining a ton of experience attacking problems. The number one thing I’ve looked for over the years is actual experience — through project work, interesting course projects, and esp. internships — in completing cool projects. And please, don’t wait to be assigned; always look for problems, and just do them.

  • Join the software ecosystem. The most impressive developers I’ve met over the years — some are currently undergrads at the Tetherless World Constellation at RPI — understand how to contribute to software ecosystem(s); usually this is through the open source community. They understand the tools, they understand how to engage with other developers, they understand how to analyze and improve other people’s code.

    Here’s one way to think about it: if you aspire to be a professional musician (or artist), chances are you’ve participated in the “music ecosystem” in a wide variety of ways for many years, even before entering college. The best developers I’ve met — and those “computer scientists” who are developers at heart — have done the same (one guy I know built his first Linux kernel when he was in middle school).

  • Understand systems end-to-end. Now we’re back to the topic at hand 😉 The best contributors over the years have been those who had hands-on experience with absolutely every aspect of the “system.” This doesn’t mean going From Relays to Twitter in 10 Weeks, but it does mean understanding the relationships between all system elements.

I doubt very much that this is a problem for anyone on this list, because the very nature of PKI work requires one to have just this sort of broad and deep knowledge; plus, your professor and I have had a few conversations about this over the years…BTW, my daughter’s now at Southampton working on her Ph.D in numerical relativity and writing code on a supercomputer cluster 😉

UPDATE (29 Oct 2010): Nature recently published this interesting article, Computational science: …Error …why scientific programming does not compute, (13 Oct 2010) on the increasing need for scientists to have hard-core software engineering skills to do their science.

Posted by: John Erickson | July 12, 2010

Data Quality is in its Fitness to the Beholder

Posted by: John Erickson | June 16, 2010

Regarding the Singularity

A recent set of articles in the New York Times and elsewhere, including the Kurzweil book, prompted a friend to ask me for my thoughts on the Singularity Movement. Here is an excerpt of the email I wrote:

Regarding the Singularity Movement, I think economic arguments such as that presented by Robin Hanson in IEEE Spectrum (2008) carry more weight than the gushing futurist predictions from the likes of Ray Kurzweil. In the Spectrum article Hanson cites two previous singularities — the agricultural and industrial revolutions — and suggests that a revolution in machine intelligence is leading to a third that will take shape over the next half-century.

I tend to take most of what futurists say with a grain of salt, because they rely on a belief/assumption/confidence that the introduction of disruptive technologies into a society yields predictable results — for good or bad — which never happens. The combination of factors including technologies being human constructions, the fact that we as humans never make completely rational decisions, and the fact that all of this takes place within a fundamentally chaotic, only approximately predictable context, means that we simply cannot know what will happen in the future!

Here’s what I know: We humans are wired to build and use tools and, to the extent possible, adapt to the environments we build — or die trying. Google, while amazing, is still a tool; an engineered system that (given enough time) I can explain to you. Ironically enough, the reason Google works so well is because it’s actually based on simpler, but more fundamental principals than the systems which preceded it, closer to how naturally-occurring networks emerge and function. But the way Google has been adopted and applied in the “ecosystem,” while making sense in hindsight, could not have been predicted.

I’m currently reading Jonah Lehrer’s How We Decide, a wonderful exploration of the biochemistry of how we make decisions. Any such discussion naturally much touch on how various imbalances (e.g. dopamine, etc) effect that process, and how well-intentioned efforts by doctors to counteract certain imbalances leads to very unexpected and usually undesired results.

Lehrer’s book makes it profoundly clear that we never know for certain what will happen when we diddle with the decision-making processes in our brain, whether it involves extending the lower levels of the nervous system (the sensory level) or the higher level processes. Researchers do know that we seem to adapt well to lower-level, e.g. neural prosthetics, but each higher-level process involves a synaptic algorithm that we don’t completely understand — mostly because our brain is a distributed system, not a single “algorithm,” whose “result” is emergent.

That ultimately is my point: our brains are distributed systems that exhibit adaptive and unpredictable behaviors, and we can’t begin to understand what will happen when we explore higher-level prosthetics based on “intelligent machines.” Something will happen, but there is no reason to believe it will lead to either a Utopian or Dystopian existence any more than the agricultural or industrial revolutions resulted in one or the other. Indeed, the introduction of those practices to certain natural and economic ecosystems led to both regional successes and catastrophes.

For Further Information:

Posted by: John Erickson | May 28, 2010

Concerning the King Arthur Flour Expansion

Recently the King Arthur Flour Company, a global provider of quality baking supplies based in my home town of Norwich, Vermont, proposed an expansion that would include a sewer extension. This issue is being debated locally, and I thought would provide good fodder for my blog…John Erickson

Since Jill and I moved to Norwich some 18 years ago, I’ve been troubled by what seems like a lack of support for sustainable economic development within our town. I’m proud that Norwich has a high-quality global company “like” King Arthur based here, a company that is employ-owned, successful and growing. At the same time I’m embarrassed that Norwich isn’t doing more to sustain the economic well being of the Upper Valley.

15 years ago this month partners and I began the process of launching a company called NetRghts. Loving Norwich and Vermont, I had a vision of starting a sustainable high-tech company that would be based here and would create local jobs. The inevitable question of where to base our company arose; being the Vermonter in the mix and drinking from the KoolAid of iconic successes like Green Mountain Gringo, I argued for us to set up offices in Norwich, Wilder or WRJ. My co-founders thought this was ludicrous; not only did they envision the (obvious to them) negative tax implications, but they also perceived no end of difficulty with infrastructure, etc. Since they had been successful with a previous Lebanon-based software startup, I went along for the ride and we set up shop in downtown Lebanon.

But I wouldn’t give up that easily. At one point Vermont eTV — remember them? — had a call-in with Gov Dean’s youthful, energetic director of economic development. Vermont had recently provided incentives for ETI’s expansion, and my direct question to “Slick” was: what can Vermont do to keep companies like ours in Vermont? Or, were my co-founders right, there (weren’t) any incentives to lure us to Vermont. His answer: regrettably, yes, my co-founders were right. If we needed money for bricks-n-mortar expansion to grow a widget-building business, yes, but since we were “knowledge-based,” nothing. Frankly, I was shocked, since this was during the same period that Gov Dean (who I’m a fan of!) was roaming the state advocating green high-tech businesses in cabins on mountaintops…

I’ve bored you with this ancient history in order to provide some context as to why I believe the citizens of Norwich should greet initiatives such as King Arthur’s with the question, what can we as neighbors do to help? Their opening proposal may or may not be ideal — I’m not saying “Roll over, little Norwich!” — but I do believe it is our responsibility to do what we can to foster economic development in this town, and this includes hearing their plans with an open mind.

I’m tired of Norwich not merely depending on, but assuming that other towns in the region will feed our hungry, host our homeless, pay our salaries, sell us our auto parts. Instead, we should be asking how we can help those among us with the initiative to bring it on home to Norwich…

Disclaimer: I am not affiliated with King Arthur Flour, but I do confess to loving their products and have been known to roam their jobs portal from time to time…

« Newer Posts - Older Posts »