The #linkeddata hashtag is once again active on the topic of linked data business models and the linked data value proposition. Scott Brinker [2] seems to have kicked things off this time around with his post, 7 business models for linked data, which Talis’ Leigh Dodds has responded to with Thoughts on Linked Data Business Models. Although I’m tempted to dive right in with comments on facets of these two great posts, I’d like first to focus on InfoChimps, a company with origins in the big data (esp. scientific dataset) community that is trying to make money by “incentivizing” the trafficking of dataset without overtly identifying themselves as a “linked data provider.”
InfoChimps is interesting to me because its infrastructure was born from the essential question of how to persist and publish datasets that users have added value to. In his podcast interview with Paul Miller of Cloud of Data [3], InfoChimps co-founder Flip Kromer says that original goal was to create a SourceForge-like service where users who modified datasets — corrected, extended, attributed, whatever — could easily share those datasets with others. InfoChimps soon went beyond this sharing model, enabling companies with valuable datasets like well-known polling and market research company Zogby International to easily upload and license their datasets.
My understanding of InfoChimps is that their focus is on making the sharing, augmenting and monetization of datasets easy. In fact, when Paul in his interview asked Flip to address topics such as linked data-based publishing, it seemed a bit like this was off-message and instead Flip focused on the simplicity and value they bring, giving users the ability to post and share their large “rectangular” datasets, i.e. in native Excel or CSV format. A key take-away from this exchange was that InfoChimps is not “leading” with technology, which I think is the right strategy (at least for now).
A few months ago in the previous life of this blog [1] I pondered the value of linked data and its providers in light of the economics of scale-free networks. My hypothesis was and is that, as with everything else that is networked, in the world of linked data the rich will get richer and value will be demonstrated by the extent to which a dataset (and a provider) links to datasets and is being linked to by other datasets. The more heavily-linked a dataset is, the more valuable it is, by definition. This means that a starting point for realizing the inherent value of a dataset is making it linkable, not merely shareable: applications and other datasets must be able to link to it, and it must leverage the linkability of other datasets.
Datasets that are difficult to use have limited value. InfoChimps has addressed the question of ease of use in a very practical way by encouraging its depositors to upload their datasets in standard “rectangular” formats such as Excel or CSV. Readers versed in linked data might see these as an ancient approach, but at a time when the “Web of Data” gospel is still just starting to spread this is actually quite smart: most data management systems (RBDMS, triple stores, graph databases) can both import and export CSV and Excel, even enormous datasets can be easily disseminated, and indeed many of the leading projects such as data.gov and data.gov.uk have applied linked data principles to expose data originally obtained in “ancient” formats including CSV. Furthermore, InfoChimps provides interfaces and mechanisms for the community to augment datasets hosted on their site, thus fostering a community-driven development of value.
The problem I think comes as we look forward to new modes of data consumption and application. The upload/license/download commercial data model, which dates back at least to the 1980s and probably much earlier, depends upon customer hosting of datasets and does not seem to cater to the many agile, dynamic approaches that the linked data community has been thinking about. But I imagine this isn’t far off; it seems more a question of how to make automated RDF mapping of widely varying CSV datasets reliable, and how to provide individualized, secure interfaces for customers that properly reflect their license agreements. In fact, at the very end of his post Leigh Dodds says the following:
…From a technical perspective Iām interested to see how well protocols like OAuth and FOAF+SSL can be deployed to mediate access to licensed Linked Data…
Me to! But for now, I think I’ll address that in a follow-up post…
Notes:
- [1] Thanks to Blogger having blocked me, I’m now a happy WordPress convert!
- [2] See also Scott Brinker’s latest presentation Marketing with Linked Data presented 12 Jan 2010 at MIT’s Linked Data Product Development Lab
- [3] In the first version of this post I mistakenly affiliated Cloud of Data’s Paul Miller with Talis. My humblest apologies!
