Peter K. Jackson, a counsel in Greenberg Glusker’s intellectual property group, tells a hypothetical story about creating (or buying?) a responsible, useful and risk-aware AI tool.
C-suites everywhere are awakening to the tantalizing efficiencies AI solutions can unlock. Most leaders realize that outpacing competitors means going beyond chatbots and other off-the-shelf tools. Many understand that achieving true gains with AI requires processing their business’ proprietary data in novel and potentially risky ways.
Outside the machine-learning space and enterprise companies, uncertainty and fear of risk often stymie efforts to advance AI ideas. Even approaching the procurement conversation can seem perilous. To build useful and trustworthy AI solutions, many businesses will have to confront cross-department questions and choices they’ve never faced before.
Whether you’re worried about internal roadblocks or ready to build (or buy) your own AI solutions, here’s how to start the conversation, choose the right team, sketch out your compliance roadmap and steer away from future legal hazards.
These steps should apply to most custom AI journeys, which are often abstract. So, to illustrate a hypothetical journey, we’ll follow a global consulting firm as it chooses among desirable AI use-cases, identifies a framework for legal compliance and orchestrates its chosen solution: a tool that generates written responses to requests-for-proposal (RFP), leveraging just the RFP itself and its knowledge of successful language and forecasting gleaned from prior outcomes.
Understanding the terrain
At a conceptual level, it’s helpful to think about an AI solution as having two core components:
- Data: The raw material used to train an intelligent model, the mathematical weights and parameters underpinning its intelligence and the input and output flowing from its use.
- Technology: Equipment and software needed to develop and operationalize the solution, including the machine-learning techniques used to interact with a model.
In any context, data will form the core of what your business brings to the table. It’s also the fundamental difference between traditional software development and developing AI solutions.
Here, we’re focusing on solutions where the data source is primarily internal and the technology is primarily external, involving a variety of third-party providers. For example, our consulting firm’s RFP generator needs to learn what RFPs look like, how to forecast costs and what language and content correlate with successful proposals, gleaned from data in past RFPs and correspondence and documentation about client-engagement outcomes, costs and revenues.
AI Is the Wild West, but Not for the Reasons You Think
As Europe moves closer to blanket rules regarding its use, CCI’s Jennifer L. Gaskin explores the evolving compliance and regulatory picture around artificial intelligence, the technology everyone seems to be using (but that we’re also all afraid of?).
Read moreSetting the data stakes
The emerging regulatory environment around AI solutions changes with head-spinning frequency. However, following a risk-management framework — always a good idea — figures to remain a bedrock principle of AI legal compliance irrespective of any context-dependent specifics. Both the EU AI Act and President Joe Biden’s October 2023 executive order endorse that approach. U.S. businesses should look to NIST’s context-agnostic AI risk management framework, a standard the U.S. is working to get adopted globally.
(Biden’s executive order on AI expressly directed officials charged with establishing global standards to “ensure that such efforts are guided by principles set out in” National Institute of Standards and Technology’s AI framework (Sec. 11). NIST has issued a “crosswalk” designed to map its AI framework to Japan’s, and more figure to follow.)
Get collaborative
Organizations should consider tasking an interdepartmental group of stakeholders to centralize oversight of AI planning and identify and implement an overall risk and governance framework for all AI solutions.
Soon after ChatGPT publicly debuted in 2022, our consulting firm created a cross-disciplinary team to identify internal use-cases for generative AI. After a board meeting, an AI committee was established, chaired by the firm’s COO alongside its CTO, general counsel, head of IT and VP of business development.
Establish authority
This group should operationalize the framework within the business. Initial, context-agnostic work will include papering the business’ risk tolerance (or appetite), establishing any business-wide AI policies or principles and formalizing decision guidance around delegation and escalation around the development, procurement and deployment of AI tools. Empower the committee to identify and fill gaps in human and system resources, monitor and interface with competitors and regulators for changes in practices and policy.
The committee knew any AI solution could require buy-in from a variety of internal and external stakeholders. It chose NIST’s AI framework as an overall governance model soon after it debuted in early 2023. It hired a VP of data science to build out its data-governance efforts, like inventorying data stores, creating and curating useful metadata and documenting datasheets.
Plan ahead
Whatever the AI solution, documenting the use-case under a context-neutral risk framework is prudent, even for ideal use-cases that may feel operationally impossible in the near-term. Work your way back to identifying impediments. Often, no records or labeling address a key variable or outcome that any AI solution would depend on. Siloed or fragmented data stores may be a cross-cutting problem. Existing compliance measures may limit the use of certain data by outside providers or systems. But aggregate benefits may justify expensive investments in data management or cloud services.
The committee gathered ideas for AI solutions from stakeholders across business units along with estimated timelines, savings and organizational effects. The data-science team inventoried and categorized data stores relevant to the proposals to improve cost forecasts. After presenting several options to the firm’s leadership, the RFP use-case emerged as an ideal pilot foray for cost, data sensitivity, testing and timeline reasons.
Chart each AI solution
Once use-cases and goals are identified, get specific. At the plan-and-design stage, documenting the context of the use-case is key, as it informs which legal or regulatory requirements may be in scope. In brief, the context of the use-case encompasses the data and technology involved and the audience of intended users and affected groups and individuals. For many organizations, implementing the early lifecycle stages of a governance framework may feel no more burdensome than existing procurement processes.
For the RFP use-case, existing large-language models were suitable for the natural-language and mathematic fine-tuning required. The firm had options: both commercially available and open-source LLMs could be localized on equipment controlled by the consultancy. Cloud infrastructure could be used to centralize and classify RFP data and afford full control and oversight of third-party providers’ development of the code, pipelines and fine-tuned model required. The personnel who prepare RFPs were excited by the prospect of systematizing routine RFP work. Importantly, they could also rigorously test outputs for trustworthiness and effectiveness.
Identify legal goals and requirements
As of today, most AI solutions operate under gray regulatory uncertainty. IP, privacy and security laws that govern data and systems still apply. That has not stopped the current frenzy to regulate a new and blurry concept that often escapes existing boundaries.
Emerging AI rules
Beyond risk-framework principles, many emerging AI regulations will likely be context-specific. In the coming months, U.S. federal agencies and departments will unveil rulemakings applicable to industries they regulate, like critical infrastructure (Department of Homeland Security), medical research (Health & Human Services, among others) and dual-use technology (Department of Defense), as required by the 2023 executive order. NIST is finalizing a profile to its existing AI framework for the development of dual-use and generative AI (both will remain a suggestion wherever sectoral rulemakings don’t require its use).
Federal laws may follow for industries outside the purview of those regulations. At the state level, California’s privacy regulator will formalize automated decisionmaking regulations on certain consumer-facing AI solutions and delineate new AI-related consumer privacy rights.
Since the RFP solution would generate synthetic content, the data-science VP and the general counsel realized adhering to the NIST profile would become necessary in spring 2024. Reviewing an inventory of the data required and the contemplated use-case, they concluded no other specific AI rules or requirements applied to the proposed solution or the consultancy in general. They agreed to monitor updates and reassess that conclusion on a bimonthly basis.
Notably, even the EU AI Act is context-specific, despite fanfare to the contrary. Businesses developing and operating AI systems must document their internal process to determine the systems’ risk level and disclose those records for “high risk” systems and upon request. What’s high risk will shift with time, but some contexts, like AI that detects emotion or uses biometrics to identify humans, carry the label by definition.
For present purposes, properly implementing the NIST AI framework will position a business to determine compliance requirements under future rules and to formalize an EU AI Act assessment with ease.
The firm’s committee agreed to monitor the EU AI Act and seek outside counsel guidance on its obligations. The firm was advised that with respect to the contemplated RFP product, its role (if any) under the EU AI Act should be assessed at a later stage of completion but that any disclosures required would be relatively easy to prepare if and when necessary due to the prep work and reporting the firm continues to document.
Well-established law
Defining options and setting goals under existing legal regimes is generally easier but no less context-specific. Here’s an overview:
- IP laws — from registrable copyrights to trade-secret protections — may offer avenues to own and protect aspects of the data used and technology developed.
The firm’s committee determined that, as a general matter, no other business could claim proprietary rights in the training data proposed to be used by the RFP generator. None of the financial or descriptive data involved would reveal the firm’s proprietary methods or recommendations.
- Privacy laws may restrict the availability or particular uses of data for training or require limits around the use of the AI solution itself. For example, privacy laws often embed the principle that a business’ use of personal information must comport with the purposes of its collection (this principle fueled a June 6 GDPR complaint about Facebook training AI models on European user data and may have contributed to its June 14 decision to pull AI products from Europe eight days later).
The data-science team implemented several software-based methods to remove personal and client-specific identifiers from training data. The RFP solution would show personal data only to the user whose input and output contain those details. Where permitted users see the client’s business address, others would see a dummy address like “100 S. Main Street.”
- Cybersecurity laws, standards and certifications may dictate aspects of the architecture of an AI solution or compel restrictions on the way it’s developed, made available or used. Overarchingly, contractual obligations will affect options under all those regimes.
Choose datasets and developers wisely
Armed with a thorough understanding of its technological resources, risk appetite, desired goals and the legal governance of its data, the business can target third-parties contributors prepared to align with its development and deployment criteria. (These are general and practical considerations about arrangements with AI technology providers — not legal advice or a discussion of the legal issues and provisions that contracts with providers should address.)
Final goals for the user experience of the RFP solution include a familiar chat interface available as add-in to Office applications with options to augment prompts by dropping in attachments like relevant email chains and past proposals. Hovering on generated content will reveal information about the source materials the output relied on, allowing users to provide meaningful feedback as the development process continues.
Scrutinizing and redlining third-party developers’ proposals for dependencies and assumptions will be more important than in the past. In addition to contractual requirements, practical strictures around the manner development work is performed will be key.
Our consulting firm’s preference was to keep as much development within its controlled environment as possible. The firm’s cloud environment afforded scalable computational resources (rented GPUs, etc.), role-based credentialing and robust logging. Data prep work meant developer personnel would never need to access meaningful proprietary information.
If an existing generative model is in scope, the business must vet the license terms and the provenance of the underlying training dataset. Commercial licenses may be required to meet criteria like housing the base model on company infrastructure (for insight or documentary purposes) or asserting ownership in the fine-tuned model created through training.
After robust internal debate, the firm rejected a proposal that would have fine-tuned the RFP model off OpenAI’s GPT-4. Housing the fine-tuned model on OpenAI’s servers meant relatively limited insight and ability to determine why training might have gone awry. Moreover, the resulting model would remain outside the consultancy’s control, much less ownership.
The chosen proposal implements the RFP generator using an open-source LLM with clearer data provenance, trained and operating from the consulting firm’s cloud environment. Subject to a commercial license to the base model, the firm could own the fine-tuned model weights. Developer access will be limited to virtual machines to ensure no local storage of any firm data on developer devices.