Via: https://www.rollingstone.com/politics/politics-news/washington-national-cathedral-bishop-trump-1235242803/

Let me make one final plea, Mr. President. Millions have put their trust in you, and as you told the nation yesterday, you have felt the providential hand of a loving God. In the name of our God, I ask you to have mercy upon the people in our country who are scared now. There are gay, lesbian, and transgender children in Democratic, Republican, and independent families — some who fear for their lives. The people who pick our crops and clean our office buildings, who labor in poultry farms and meatpacking plants, who wash the dishes after we eat in restaurants and work the night shifts in hospitals — they may not be citizens or have the proper documentation, but the vast majority of immigrants are not criminals. They pay taxes and are good neighbors. They are faithful members of our churches and mosques, synagogues, gurdwara, and temples.

“I ask you to have mercy, Mr. President, on those in our communities whose children fear their parents will be taken away, and that you help those who are fleeing war zones and persecution in their own lands to find compassion and welcome here. Our God teaches us that we are to be merciful to the stranger, for we were all once strangers in this land. May God grant us the strength and courage to honor the dignity of every human being, to speak the truth to one another in love, and walk humbly with each other and our God, for the good of all people — the good of all people in this nation and the world. Amen.”

Posted by: John Erickson | October 12, 2024

ChatBS, the Context-aware LLM Exploratory Sandbox

The ChatBS logo is by The Sketch Effect Atlanta, GA USA Atlanta, GA USA

ChatBS, the Context-aware LLM Exploratory Sandbox uses the OpenAI Completion API service (GPT and Llama family models) to answer questions. Each sentence in a ChatBS result is automatically linked to a Google query to facilate fact-checking. If requested, ChatBS can then use the OpenAI API to construct an entity/relation graph of these results in the form [‘entity1’, ‘relationship’, ‘entity2’]. ChatBS then uses entity linking to look up both the entities and relationships against Wikidata entities and properties, constructing a JSON-LD graph as is proceeds. For each re-submission of the user’s prompt we display the LLM response and the resulting RDF (if requested)

ISWC 2024 demo abstract: The recent widespread public availability of generative large language models (LLMs) has drawn much attention from the academic community to run experiments in order to learn more about their strengths and drawbacks. From prompt engineering and fine-tuning to fact-checking and task-solving, researchers have pursued several approaches to try to take advantage of these tools. As some of the most powerful LLMs are “closed” and only accessible through web APIs with prior authorization, combining LLMs with the open web is still a challenge. In this evolving landscape, tools that can facilitate the exploration of the capabilities and limitations of LLMs are desirable, especially when connecting with traditional web features such as search and structured data. This article presents ChatBS, a web-based exploratory sandbox for LLMs, working as a front-end for prompting LLMs with user inputs. It provides features such as entity resolution from open knowledge graphs, web search using LLM outputs, as well as popular prompting techniques (e.g. multiple submissions, “step-by-step”). ChatBS has been extensively used in Rensselaer Polytechnic Institute’s Data INCITE courses and research, serving as key tool for utilizing LLMs outputs at scale in these contexts.

ChatBS Live App: https://inciteprojects.idea.rpi.edu/chatbs/app/chatbs/

ChatBS Project Page: https://tw.rpi.edu/project/chatbs-context-aware-llm-exploratory-sandbox

Posted by: John Erickson | August 17, 2023

Technical and Legal Aspects of Regulating Generative AI

Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are rapidly transforming the way we communicate, illustrate, and create. With this (currently…) informal project at Rensselaer we consider these new generative models in the context of AI regulation and ponder not only how laws might be tailored to their capabilities, but also to what extent those capabilities can be constrained to comply with laws.       

Our group is actively curating a reading list on some of the practical aspects of implementing regulations on large language models. The papers, presentations and posts we’re finding have been prepared by some of the leading thinkers in AI, law and government, several of whom Tetherless World researchers know and have collaborated with in the past. 

Some of the topics under consideration: 

Please contact me if you have suggestions for this list!

Posted by: John Erickson | January 11, 2021

Don’t Cry for Parler; or, Edge Computing for Revolutionaries

Created 11 Jan 2021; Updated 15 Jan 2021

A few thoughts on why the de-platforming of Parler and the removal of thousands of accounts associated with QAnon from Twitter should be a reminder to online communities about the importance of platform independence — and also a reminder of some things forward-thinking people tried to warn us about starting more than two decades ago, but most of the “cool kids” building and adopting apps and platforms have refused to listen to…

Reminder #1: The risks inherent in relying on device-specific apps and the distribution channels (aka app stores) they rely on: When developers create “native” apps they are subject to the whims of the channels that control the distribution of those apps. Developers who instead create Web apps can instantly publish and do not require the approval (technical, content or otherwise) of platform owners.

Reminder #2: The risks inherent in using cloud-based services for data management, service provision, yada-yada: Live by the Cloud, die by the Cloud. Cloud services enable companies and organizations to spin up quickly and to scale efficiently; you get as much infrastructure as you can pay for, as quickly as you can pay for it! The Cloud also makes you entirely dependent upon the whims of your provider, within the limits of your service agreement (which will naturally favor your provider). The alternatives to The Cloud are to purchase and maintain your own infrastructure, subject only to the whims of your landlord, Mother Nature and the scalability of that infrastructure; or to find Alt-Cloud-equivalents of AWS (which probably exist). Again, you’d be dependent upon their whims, but perhaps you’ll find one with appropriately Left- or Right-wing sensibilities. An even better approach is to adopt a decentralized social network architecture; see next….

Reminder #3: The risks inherent in keeping all of your eggs — your users’ data assets — in one basket: Parler reportedly scrambled to get its millions of users’ data out of AWS before the January 10th shutdown. This is a duel “boo-hoo on you”: first on Parler, for trusting that the data in AWS would always be accessible to them; then on Parler users, for relying on a centralized service to manage their personal data. It has been clear forever (at least in Internet time) that decentralized, or edge-based, data persistence is the most robust approach, enabling personalized data (self) management.

For more on e.g. decentralized data management see this New York Times story, coincidentally also from January 10th, about Tim Berners-Lee’s open-source software project, Solid, and his Solid-based start-up, Inrupt: https://www.nytimes.com/…/tim-berners-lee-privacy… Others are leveraging Bitcoin technology to implement decentralized cloud data storage services.

DISCLAIMER: I’m in no way defending Parler and their users; indeed my message to them is, “Hmmm, sucks being you!” But online communities of any orientation may be subject to the platform interventions that we saw take place in the aftermath of January 6th; for more thoughts on the potential consequences, please see the related links that follow.

RELATED LINKS:

Posted by: John Erickson | January 24, 2017

A Tribute to Norman Paskin

The following is an invited tribute to Dr. Norman Paskin that will appear in an upcoming Data Science Journal special issue on persistent identifiers (PIDs)…

We were shocked and saddened to learn of the death of our longtime friend and colleague Dr. Norman Paskin in March 2016.

norman-paskin

Norman Paskin

Norman will be remembered as the thoughtful and tireless founding director of the International DOI Foundation (IDF). Some of us were fortunate to have known him in the earliest days, during those formative pre-DOI Foundation meetings when technical and political views came together to form the foundation of what the DOI is today. The early days of DOI involved many lengthy, sometimes heated, email and face-to-face discussions, and we fondly remember Norman as a sensible voice calling out in the wilderness.

Establishing sound technical foundations for the DOI was surely only a first step; the DOI’s long-term success and sustainability would depend on its widespread adoption, which in turn would require a clear message, sensible policies that would benefit a wide range of stakeholders, and constant evangelism. To the surprise of no one — except, perhaps, the man himself! — Norman Paskin was chosen in 1998 as the founding director of the IDF, and set out to spread the gospel of persistent identifiers while defining the mission of the IDF. Norman conveyed the message so well that twenty years later it is hard to imagine arguments against the DOI; indeed, its example is so compelling that in domains that can’t directly adopt the DOI, we see parallel object identifier systems emerging, modeled directly after the DOI.

A critical component of the DOI’s success is the robustness of its underlying infrastructure, the Handle System(tm), created and administered by Bob Kahn’s team at Corporation for National Research Initiatives (CNRI). Not long into the life of the DOI and IDF, it became clear that the long-term success of the DOI and other emerging object naming systems based on the Handle System would in turn depend on a well-considered set of Handle System governance policies. In order to consider the needs of a range of current and future stakeholders, a Handle System Advisory Committee (HSAC) was formed in early 2001; on the HSAC Norman naturally represented the interests of the IDF and its members, but also understood the perspectives of CNRI, then the operator of the Handle System, as well as other Handle System adopters.

It was our pleasure to work directly with Norman on DOI matters, including early technology demonstrators that we demoed at the Frankfurt Book Fair and other conferences in the late 1990s, and later mutually participating in HSAC meetings and various DOI strategy sessions. Whenever we saw each other, in New York, Oxford, Washington, London or Frankfurt, we would resume conversations, from yesterday to last year, via email or in person. To all who knew him, Norman Paskin set the standard both literally and figuratively; his friends and colleagues miss him tremendously, but he will persist in our professional memories and in our hearts.

John S. Erickson, PhD
Director of Research Operations
The Rensselaer Institute for Data Exploration and Applications
Rensselaer Polytechnic Institute, Troy, NY  (USA)

Laurence W. Lannom
Vice President Director of Information Services
The Corporation for National Research Initiatives (CNRI), Reston, VA (USA)

Other tributes to Norman Paskin:

  1. Mark Seeley and Chris Shillum, Remembering Norman Paskin, a pioneer of the DOI system for scholarly publishing
  2. Ed Pentz, Dr Norman Paskin
  3. The Jensen Owners Club, Norman Paskin
Posted by: John Erickson | January 10, 2014

What’s all this about a W3C DRM standard?

Over the past few days there has been renewed discussion of the controversial W3C Encrypted Media Extension proposal with the publication of a revised draft. (07 Jan 2014). Today I’d like to provide a bit of background, based on my long experience in the digital rights management “game” and my familiarity with the W3C process.

Who are the players? The primary editors of the W3C EME draft are employed by Google, Microsoft and Netflix, but corporate affiliation really only speaks to one’s initial interest; W3C working groups try to work toward concensus, so we need to go deeper and see who is actually active in the formulation of the draft. Since W3C EME is a work product of the HTML Working Group, one of the W3C’s largest, the stakeholders for EME are somewhat hidden; one needs to trace the actual W3C “community” involved in the discussion. One forum appears to be the W3C Restricted Media Community Group; see also the W3C restricted media wiki and mailing list. A review of email logs and task force minutes indicates regular contributions from representatives of Google, Microsoft, Netflix, Apple, Adobe, Yandex, a few independent DRM vendors such as Verimatrix, and of course W3C. Typically these contributions are highly technical.

A bit of history: The “world” first began actively debating the W3C’s interest in DRM as embodied by the Encrypted Media Extension in Octover 2013 when online tech news outlets like Infoworld ran stories about W3C director Tim Berners-Lee’s decision move forward and the controversy around that choice. In his usual role as anti-DRM advocate, Cory Doctorow first erupted that Ocober, but the world seems to be reacting with renewed vigor now. EFF has also been quite vocal in their opposition to W3C entering into this arena. Stakeholders blogged that EME was a way to “keep the Web relevant and useful.”

The W3C first considered action in the digital rights management arena in 2001, hosting the Workshop on Digital Rights Management (22-23 January 2001, INRIA, Sophia Antipolis, France), which was very well attended by academics and industrial types including the likes of HP Labs (incl. me), Microsoft, Intel, Adobe, RealNetworks, several leading publishers, etc.; see the agenda. The decision at that time was Do Not Go There, largely because it was impossible to get the stakeholders at that time to agree on anything “open,” but also because in-browser capability was limited. Since that time there has been a considerable advancements in support for user-side rendering technologies, not to mention the evolution of Javascript and the creation of HTML5; it is clear that W3C EME is a logical, if controversial, continuation in that direction.

What is this Encrypted Media Extension? The most concise way to explain EME is, that it is an extension to HTML5’s HTMLMediaElement that enables proprietary controlled content handling schemes, including encrypted content. EME does not specify a specific content protection scheme, but instead allows for vendor-specific schemes to be “hooked” via API extensions. Or, as the editors describe it,

“This proposal allows JavaScript to select content protection mechanisms, control license/key exchange, and implement custom license management algorithms. It supports a wide range of use cases without requiring client-side modifications in each user agent for each use case. This also enables content providers to develop a single application solution for all devices. A generic stack implemented using the proposed APIs is shown below. This diagram shows an example flow: other combinations of API calls and events are possible.”

A generic stack implemented using the proposed APIs
A generic stack implemented using
the proposed W3C Encrypted Media Extension APIs

Why is EME needed? One argument is that EME allows content providers to adopt content protection schemes in ways that are more browser- and platform-independent than before. DRM has a long history of user-unfriendliness, brittle platform dependence and platform lock-in; widespread implementation could improve user experiences while given content providers and creators more choices. The dark side of course is that EME could make content protection an easier choice for providers, thereby locking down more content.

The large technology stakeholders (Google, Microsoft, Netflix and others) will likely reach a concensus that accomodates their interests, and those of stakeholders such as the content industries. It remains unclear how the interests of the greater Internet are being represented. As an early participant in the OASIS XML Rights Language Technical Committee (ca 2002) I can say these discussions are very “engineer-driven” and tend to be weighted to the task at hand — creating a technical standard — and rarely are influenced by those seeking to balance technology and public policy. With the recent addition of the MPAA to the W3C, one worries even more about how the voice individual user will be heard.

For further reading:


John Erickson is the Director of Research Operations of The Rensselaer IDEA and the Director of Web Science Operations with the Tetherless World Constellation at Rensselaer Polytechnic Institute, managing the delivery of large scale open government data projects that advance Semantic Web best practices. Previously, as a principal scientist at HP Labs John focused on the creation of novel information security, identification, management and collaboration technologies. As a co-founder of NetRights, LLC John was the architect of LicensIt(tm) and @ttribute(tm), the first digital rights management (DRM) technologies to facilitate dialog between content creators and users through the dynamic exchange of metadata. As a co-founder of Yankee Rights Management (YRM), John was the architect of Copyright Direct(tm), the first real-time, Internet-based service to fully automate the complex copyright permissions process for a variety of media types.

Posted by: John Erickson | July 29, 2013

Imagination, Policymaking and Web Science

On 26 July the The Pew Research Center for the People & the Press released Few See Adequate Limits on NSA Surveillance Program…But More Approve than Disapprove which they’ve summarized in this post. Here’s a snippet…

…(D)espite the insistence by the president and other senior officials that only “metadata,” such as phone numbers and email addresses, is being collected, 63% think the government is also gathering information about the content of communications – with 27% believing the government has listened to or read their phone calls and emails…Nonetheless, the public’s bottom line on government anti-terrorism surveillance is narrowly positive. The national survey by the Pew Research Center, conducted July 17-21 among 1,480 adults, finds that 50% approve of the government’s collection of telephone and internet data as part of anti-terrorism efforts, while 44% disapprove. These views are little changed from a month ago, when 48% approved and 47% disapproved.

A famous conclusion of the 9/11 Commission was that a chronic and widespread “failure of imagination” led to the United States leaving its defenses down and enabling Bin Laden’s plot to succeed. This is a bit of an easy defense, and history has shown it to not be completely true, but I think in general we do apply a kind of double-think when contemplating extreme scenarios. I think we inherently moderate our assumptions about how far our opponents might go to win and the range of methods they will consider. How we limit our creativity is complex, but it is in part fueled by how well informed we are.

The Pew results would be more interesting if the same questions had been asked before the Edward Snowden thing, because it would have created a “baseline” of sorts for how expansive our thinking was and is. What the NSA eruption has shown us is that our government is willing to collect data at a much greater scale than most people imagined. The problem lies with that word, imagined. What if we asked instead, “What is POSSIBLE?” Not “what is possible within accepted legal boundaries,” but rather “what is possible, period, given today’s technology?” For example, what if the NSA were to enlist Google’s data center architects to help them design a state-of-the-art platform?

Key lawmakers no doubt were briefed on the scale of the NSA’s programs years ago, but it is unlikely most of the legislators or their staffers were or are capable of fully appreciating what is possible with the data collected, esp. at scale. One wonders who is asking serious, informed questions about what is possible with the kind and scale of data collected? Who is evaluating the models, etc? Who is on the outside, using science to make educated guesses about what’s “inside?”

Many versions of the web science definition declare our motivation ultimately to be “…to protect the Web.” We see the urgency and the wisdom in this call as we watch corporations and governments construct massive platforms that enable them to monitor, analyze and control large swaths and facets of The Global Graph. It is incumbent upon web scientists to not simply study the Web, but to use the knowledge we gain to ensure that society understands what influences the evolution of that Web. This includes the daunting task of educating lawmakers.

Why study web science? Frankly, because most people don’t know what they’re talking about. On the issues of privacy, tracking and security, most people have no idea what is possible in terms of large-scale data collection, what can be learned by applying modern analytics to collected network traffic, and what the interplay is between technological capabilities and laws. Fewer still have a clue how to shape the policy debate based on real science, especially a science rooted in the study of the Web.

Web science as a discipline gives us hope that there will be a supply of knowledgeable — indeed, imaginative — workers able to contribute to that discussion.

Posted by: John Erickson | July 23, 2013

Senator Leahy’s position on Aaron’s Law and CFAA Reform

Recently I wrote each member of Vermont’s congressional delegation, Senators Patrick Leahy and Bernie Sanders and Congressman Peter Welch, regarding Aaron’s Law, a proposed bill named in memory of the late Aaron Swartz that would implement critical changes to the notorious Computer Fraud and Abuse Act (CFAA) (18 U.S.C. 1030). As usual, Senator Leahy responded quickly and with meat:

Dear Dr. Erickson:

Thank you for contacting me about the need to reform the Computer Fraud and Abuse Act (CFAA). I appreciate your writing to me about this pressing issue.

In my position as Chairman of the Senate Judiciary Committee, I have worked hard to update the Computer Fraud and Abuse Act in a manner that protects our personal privacy and our notions of fairness. In 2011, I included updates to this law in my Personal Data Privacy and Security Act that would make certain that purely innocuous conduct, such as violating a terms of use agreement, would not be prosecuted under the CFAA. This bill passed the Judiciary Committee on November 22, 2011, but no further action was taken in the 112th Congress. I am pleased that others in Congress have joined the effort to clarify the scope of the CFAA through proposals such as Aaron’s law. Given the many threats that Americans face in cyberspace today, I believe that updates to this law are important. I am committed to working to update this law in a way that does not criminalize innocuous computer activity.

As technologies evolve, we in Congress must keep working to ensure that laws keep pace with the technologies of today. I have made this issue a priority in the past, and will continue to push for such balanced reforms as we begin our work in the 113th Congress.

Again, thank for you contacting me, and please keep in touch.

Sincerely,

PATRICK LEAHY
United States Senator

Thanks again for your great service to Vermont and the United States, Sen. Leahy!

References

Posted by: John Erickson | July 19, 2013

Whistleblowing, extreme transparency and civil disobedience

In her recent post Whistleblowing Is the New Civil Disobedience: Why Edward Snowden Matters the great danah boyd wrote:

Like many other civil liberties advocates, I’ve been annoyed by how the media has spilled more ink talking about Edward Snowden than the issues that he’s trying to raise. I’ve grumbled at the “Where in the World is Carmen Sandiego?” reality show and the way in which TV news glosses over the complexities that investigative journalists have tried to publish as the story unfolded. But then a friend of mine – computer scientist Nadia Heninger – flipped my thinking upside down with a simple argument: Snowden is offering the public a template for how to whistleblow; leaking information is going to be the civil disobedience of our age.

For several weeks I’ve debated with friends and colleagues over whether Mr. Snowden’s acts indeed represent civil disobedience and not some other form of protest. I’ve argued, for example, that they might not because he didn’t hang around to “face the consequences.” danah’s post provoked me to examine my views more deeply, and I sought out a more formal definition (from the Stanford Encyclopedia of Philosophy) to better frame my reflection. Based on how Mr. Snowden’s acts exhibit characteristics including conscientiousness, communication, publicity and non-violence, I do now see his whistleblowing as an example of civil disobedience.

Conscientiousness: All the evidence suggests that Mr. Snowden is serious, sincere and has acted with moral conviction. To paraphrase the Stanford Encyclopedia, he appears to have been motivated not only out of self-respect and moral consistency but also by his perception of the interests of his society.

Communication: Certainty Mr. Snowden has sought to disavow and condemn US policy as implemented by the NSA and has successfully drawn public attention to this issue; he has also clearly motivated others to question whether changes in laws and/or policies are required. The fact that he has legislators from both sides of the aisle arguing among themselves and with the Omama Administration is testimony to this. It is not clear to me what specific changes (if any) Mr. Snowden is actually seeking, and he certainly has not been actively engaged in instigating changes e.g. behind the scenes, but I don’t think this is required; his acts are clearly about effecting change by committing extreme acts of transparency.

Publicity: This is an interesting part of the argument; while e.g. Rawls and Bedau argue that civil disobedience must occur in public, openly, and with fair notice to legal authorities, Smart states what seems obvious: to provide notice in some cases gives political opponents and legal authorities the opportunity to suppress the subject’s efforts to communicate. We can safely assume that Mr. Snowden did not notify his superiors at the NSA, but his acts might be still be regarded as “open” as they were closely followed by an acknowledgment and a statement of his reasons for acting. He has not fully disclosed what other secret documents he has in is possession, but it does not appear he has anonymously released any documents, either.

Non-violence: To me this is an important feature of Mr. Snowden’s acts; as far as we know, Mr. Snowden has focusing on exposing the truth and not on violence or destruction. This is not to say that forms of protest that do result in damage property (e.g. web sites) are not civil disobedience; rather, the fact that he did not deface web sites or (to our knowledge) violate access control regimes does qualify his acts as non-violent.

I have no idea whether Mr. Snowden read Thoreau’s Civil Disobedience or even the Wikipedia article, but his acts certainly exhibit the characteristics of civil disobedience and may serve as a “template” for whistleblowers moving forward. As a technologist, my fear is that his acts also provide a “use case” for security architects, raising the bar for whistleblowers who aim to help us (in danah’s words) “critically interrogate how power is operationalized…”

Note: This post originally appeared as a comment to danah boyd, Whistleblowing Is the New Civil Disobedience: Why Edward Snowden Matters.

Posted by: John Erickson | June 10, 2013

Enabling Linked (Open) Data Commerce through Metadata

Older Posts »

Categories