homepage | Protecting Data in the Age of Artificial Intelligence

July 11, 2023

Protecting Data in the Age of Artificial Intelligence

by Pete Dulin

AI And You, UMKC TalentLink's artificial intelligence blog and newsletter

Newly developed tools offer potential upside and downside, depending on how they are used. Generative artificial intelligence offers both transformative possibilities and troubling pitfalls to consider. Protecting data in the age of artificial intelligence is a complicated and evolving issue for businesses and workers.

Prepare For Data Scraping

Generative AI tools like ChatGPT, Google Bard, and Bing Chat rely on a large language model (LLM), a type of neural network trained on data available on the web. A growing number of companies and platforms are taking steps to protect data on their platform from data scraping. This practice refers to a computer program that extracts data from the web and repurposes the data on other websites and platforms. One issue is data scraping without paying the source for the data.

Major web-based platforms are taking steps to corral their user-generated data. Reddit announced in April that it will charge for use of its API, seeking to monetize and limit access to its user-generated content. Reddit’s content has been scraped and used extensively by third-party platforms to train AI tools. Stack Overflow and Twitter have also evaluated measures to restrict access to and charge for their data.

Businesses and individuals may want to consider concerns, questions, and ramifications if their data and content can be scraped off the web, whether authorized or not.

  • Is the data or content protected or publicly available?
  • Is it protected by a paywall or other measure?
  • Are there specific terms of use if protected data is leaked or share by a user?
  • Is the protection sufficient now in an AI environment?
  • Is there an opportunity to monetize valuable data or content in new ways as part of your business model? For example, Reddit generates revenue from advertising and premium membership plans. Charging for access to its content opens another pathway to generate revenue.

Visionary or Exploitive?

Generative AI image platforms like MidJourney and Stable Diffusion have been trained on large datasets of scraped images from online sources.  Visual artists and writers have protested and initiated lawsuits to challenge “scraping” of their copyrighted work available on the web for reuse by generative visual AI platforms. Getty Images has also filed a lawsuit to protect its visual content. In turn, the platforms have produced a counter response to the class-action lawsuit against them.

To combat the practice of image scraping, Glaze AI was developed as a protective tool for artists to disrupt “style mimicry” of their work. Here are a few other tactics that artists and image-based businesses can take to protect images from AI image generators.

Does your business rely on imagery or visual data that can be scraped and used by generative AI?

Data Security, Privacy, and Compliance

Many firms have measures in place to ensure data privacy, security, and compliance to meet legal, regulatory, and operational requirements. These measures not only protect customers but also shield the company as it observes defined regulations.

The growing use of AI by the public and competitors might prompt an internal audit of a company’s data security. Internal adoption of AI tools necessitates re-evaluation of company policies and practices on how data is used and safeguarded by employees.

How can proprietary company data, customer information, and intellectual property be protected when an employee uses ChatGPT, for example?

Amazon became aware of instances where ChatGPT responses appeared similar to internal Amazon data. Accordingly, Amazon’s corporate lawyer “warned employees not to provide ChatGPT with any Amazon confidential information,” such as code.

Salesforce announced in March that it is making a push into generative AI, including making ChatGPT available in Slack.

AI + Employees

Data security concerns in the age of generative AI isn’t alarmist. Fifteen percent of employees regularly paste company data into GenAI on a weekly or even daily basis, according to a June research report that analyzed the behavior of over 10,000 employees. Cutting and pasting sensitive data into Gen AI bypasses data protection measures.

Nearly 70% of professionals use AI tools without their boss’ knowledge. That figure is based on a Fishbowl survey of nearly 12,000 professionals from Bank of America, Amazon, JP Morgan, and numerous other firms.

Whether intentional or unknowingly, employees in sales, marketing, finance, engineering, IT, or customer service might disclose sensitive data when incorporating ChatGPT or other generative AI into workflows.

Data Protection in the Age of AI

Where to start or, more likely, how can companies strengthen their data security policies and practices with regard to generative AI?

The report Beyond Hypotheticals: Understanding the Real Possibilities of Generative AI addresses top safety concerns around implementing generative AI as well as key tenets for AI-related security. Fundamental elements include defining roles and responsibilities, data governance, compliance and legal considerations, and employee training protocols.

The promising news is that 81% of leaders say their company has already established, or is currently developing, internal generative AI policies, according to survey results.

Where does your company stand?

Let UMKC TalentLink know how your company is using artificial intelligence. What data security measures have you put into place as a result? Email us with your story and tips.

If you haven’t already, please join our newsletter AI + You and share it with others. You’ll receive updates on our latest posts on artificial intelligence and how it impacts work and innovation.