Scale-free Networks and the Value of Linked Data

Posted by: John Erickson | January 26, 2010

Scale-free Networks and the Value of Linked Data

I believe vibrant, thriving networks are expressions of inherent value and provide ecosystems of opportunity for individuals and organizations to foster their growth through unique contributions in the form of content and data, tools, or infrastructure. For this reason, as I read Eric Hellman’s recent post 8 One-Way Business Models for Linked Data — a well-considered response to Scott Brinker’s seven business models (plus an eighth…) that can make Linked Data viable — I felt something was missing; something more needed to be said about the inherent value of growing networks embodying linked data principles. I therefore post this re-considered piece which first appeared in Dec 2009 on my decommissioned Blogspot blog…

Over the past few months Kingsley Idehen of OpenLink Software and others on the Business of Linked Data (BOLD) list have been debating a value proposition for linked data via Twitter (search for #linkeddata) and email. The discussion has included useful iterations on various “elevator pitches” and citations of recent successes, especially the application of GoodRelations e-commerce vocabularies at Best Buy. After some deep thought I decided to take the question of value in a different direction and to consider it from the perspective of the science of networks, especially with reference to the works of Albert-László Barabási, director of the Center for Complex Network Research and author of Linked: The New Science of Networks.

I’d like to test the idea here that data sharing between organisations based on linked open data principles is the approach most consistent with the core principles of a networked economy. I believe that the linked data model best exploits “networking thinking” and maximizes an organisation’s ability to respond to changes in relationships within the “global graph” of business. Using Barabási as a framework, linked data is the approach that most embodies a networked view of the economy from the macro- to the micro-economic level, and therefore best empowers the enterprise to understand and leverage the consequences of interconnectedness.

As has been noted numerous times elsewhere, the so-called Web of Data is perhaps the web in its purest form. Following Tim Berners-Lee principles or “rules” as stated in his Linked Data Design Issues memo from 2006, we have a very elegant framework for people and especially machines to describe the relationship between entities in a network. If we are smart about how we define those links and the entities we create to aggregate those links — the linked datasets we create — we can build dynamic, efficiently adaptive networks embodying the two laws that govern real networks: growth and preferential attachment. Barabási illustrates these two laws with an example “algorithm” for scale-free networks in Chapter 7 of Linked. The critical lessons are (a) networks must have a means to grow — there must not only be links, but the ability to add links, and (b) networks must provide some mechanism for entities to register their preference for other nodes by creating links to the more heavily-linked nodes. Preferential attachment ensures that the converse is also true: entities will “vote with their feet” and register their displeasure with nodes by eliminating links.

In real networks, the rich get richer. In the Web, the value is inherent in the links. Google’s PageRank merely reinforced the “physical” reality that the most valuable properties in the Web of Documents are those resources that are most heavily linked-to. Those properties provide added value if they in turn provide useful links to other resources. The properties that are sensitive to demand and can adapt to the preferences of their consumers, especially to aggregate links to more resources that compound their value and distinguish them from other properties, are especially valuable and are considered hubs.

Openness is important. At this point it is tempting to jump to the conclusion that Tim Berners-Lee’s four principles are all we need to create a thriving Web of Data, but this would be premature; Sir Tim’s rules are necessary but not sufficient conditions. Within any “space” where Webs of Data are to be created, whether global or constrained within an organisation, the network must embody the open world assumption as it pertains to the web: when datasets or other information models are published, their providers must expect them to be reused and extended in ways they cannot control. In particular this means that entities within the network, whether powered by humans or machines, must be free to arbitrarily link to — to make assertions about — other entities within the network. The “friction” of obtaining permission in this linking process must approximate zero.

Don’t reinvent and don’t covet! The extent of graphs that are built within organisations should not stop at their boundaries; as the BBC has shown so beautifully with their use of linked data on the revamped BBC web site, the inherent value of their property was increased radically by not only linking to datasets provided elsewhere, openly on the “global graph,” but also by enabling reuse of their properties. The BBC’s top-level principles for the revamped site are all about openness and long-term value:

The site has been developed against the principles of linked open data and RESTful architecture where the creation of persistent URLs is a primary objective. The initial sources of data are somewhat limited but this will be extended over time. Here’s our mini-manifesto: Persistence…Linked open data…RESTful…One web

The BBC has created a valuable “ecosystem”; their use of other resources, especially MusicBrainz and DBPedia, has not only made the BBC site richer but in turn has increased the value of those properties. And those properties will continue to increase in value; by the principle of preferential attachment, every relationship “into” a dataset by valuable entities such as the BBC in turn increases the likelihood that other relationships will be established.

Links are not enough. It should be obvious that simply exposing datasets and providing value-added links to others isn’t enough; as Eric Hellman notes, dataset publishers must see themselves service providers who add value beyond simply exposing data. Some will add value to the global graph by gathering, maintaining, publishing useful datasets and fostering a community of users and developers; others will add value by combining datasets from other services in novel ways, possibly decorated by their own. Eric has argued that the only winners in the linked open data space have indeed been those who have provided such merged datasets as a service.

Provide value-adding services and foster community. For those dataset providers asking how you might realise the full value potential of publishing your datasets on the Web, I suggest that you examine whether, based on the principles I’ve outlined above, you have done everything you can to make your datasets part of the Web, rather than merely “on” it, and thereby are truly adding value to the global graph:

Do you view yourselves as a service?
Have you made your datasets as useful and easy-to-use as possible?
Have you provided the best possible community support, including wikis and other mechanisms?
Have you fully documented your vocabularies?
Have you clearly defined any claimed rights, and in particular have you considered adopting open data principles?

Updates:

For more on why “…linked data…is the best approach available for publishing data in a hugely diverse and distributed environment, in a gradual and sustainable way…” see Jeni Tennison’s recent post, Why Linked Data for data.gov.uk?.
One of the more important contributions to the study of networks in the past few years is Experience vs. Talent Shapes the Structure of the Web (2008; Joseph Kong, Nima Sarshar, and Vwani Roychowdhury) which used large-scale crawl data to “investigate and validate the dynamics that underlie the evolution of the structure of the web.” The authors’ study shows that neither age (“experience”) nor status as a promising upstart (“talent” or “fitness”) are immediate indicators of success. They suggest that a more experience-based fitness ranking could be included in the overall ranking of a search result; one simple way to think of this is if we could filter Google results based on how fast certain resources are rising in the rankings.

T3FH55EJ9AFX

Posted in linked data, metadata, web science | Tags: bbc, business, linked data, metadata, semantic web, web science

Responses

The density and quality of the LINK mesh (network) is the critical success factor re. the “Data as a Service” (DaaS) business model. Thus, the most navigable highway will attract the most traffic (as is the case in real life re. highways and commerce).

I see each LINK as a the canonical unit of Web Presence. Its the seed of Linked Data Spaces (or meshes). Thus, value will be exposed by Linked Data Spaces, but you cannot reach this end product without an initial seed endowed with high SDQ.

Bottom line, people need points of presence on the Web that enable them to describe themselves, what they offer, and what they need. Once in place, via a Profile oriented Linked Data Space, the Web will absolutely take care of the REST — Serendipitous Binding of Relevant Things.

Links:

1. http://www.seangolliher.com/2009/linked-data/serendipitous-discovery-quotient-sdq-the-future-of-seo-or-an-abstract-concept/

2. http://bit.ly/3jZTWP — my original post re. Serendipitous Discovery Quotient (SDQ).

Kingsley
By: Kingsley Idehen on January 26, 2010
at 5:13 pm

Reply

Bitwacker Associates