Researchers find LLMs like ChatGPT output sensitive data even after it’s been ‘deleted’

county Hill state North Carolina AI Machine Learning ChatGPT

02.10.2023 - 19:51

Reading now: 153

cointelegraph.com:

A trio of scientists from the University of North Carolina, Chapel Hill recently published preprint artificial intelligence (AI) research showcasing how difficult it is to remove sensitive data from large language models (LLMs) such as OpenAI’s ChatGPT and Google’s Bard.

According to the researchers' paper, the task of “deleting” information from LLMs is possible, but it’s just as difficult to verify the information has been removed as it is to actually remove it.

The reason for this has to do with how LLMs are engineered and trained. The models are pretrained on databases and then fine-tuned to generate coherent outputs (GPT stands for “generative pretrained transformer”).

Once a model is trained, its creators cannot, for example, go back into the database and delete specific files in order to prohibit the model from outputting related results. Essentially, all the information a model is trained on exists somewhere inside its weights and parameters where they’re undefinable without actually generating outputs. This is the “black box” of AI.

A problem arises when LLMs trained on massive datasets output sensitive information such as personally identifiable information, financial records, or other potentially harmful and unwanted outputs.

Related: Microsoft to form nuclear power team to support AI: Report

In a hypothetical situation where an LLM was trained on sensitive banking information, for example, there’s typically no way for the AI’s creator to find those files and delete them. Instead, AI devs use guardrails such as hard-coded prompts that inhibit specific behaviors or reinforcement learning from human feedback (RLHF).

In an RLHF paradigm, human assessors engage models with the purpose of eliciting both wanted

Researchers find LLMs like ChatGPT output sensitive data even after it’s been ‘deleted’

Related news

Ethereum price drops to a 7-month low as data points to more downside

THORswap back online 6 days after halt over detecting FTX funds

Wells Fargo shares rise after third-quarter results top Wall Street expectations

Citigroup stock jumps on better-than-expected revenue for the third quarter

Research Firm Elliptic Links FTX Hack to Russian Attackers

How CBDCs and stablecoins can coexist: FIS panel discussion

Circle Launches Circle Research with Perimeter Protocol

Bitcoin Price Prediction: BTC Slips 3% as Cryptocurrency Market Reacts to CPI Data as JPMorgan Awaits SEC's ETF Verdict

US Space Force pauses use of ChatGPT-like tools due to security fears: Report

Stocks fall, yields rise as inflation data comes in hotter than expected

Stars Arena recovers 90% of exploited funds after onchain negotiations

Uptober might be over: Bitcoin price data shows investor sentiment at 3-month low

Bitcoin Price Prediction as US Core Inflation Data is Announced – Is a Bullish Reversal Imminent?

Coinbase Trading Volume Plunges 52% to $76 Billion, Marks Lowest Quarter After Public Listing

While ChatGPT stokes fears of mass layoffs, new jobs are being spawned to review AI

Former Barclays CEO Staley fined and banned by UK regulator over Epstein links

Alameda Research Lost $200M to Several Phishing Attacks Due to “Poor Security Practices”: Ex-FTX Engineer

Alameda Research lost $190M to scams and ‘questionable’ blockchains: Whistleblower

Bitcoin Hashrate to Drop by 20% After Next Halving Event: JPMorgan

China's 'Big Four' banks rally after state wealth fund boosts stake

Stars Arena recovers 90% of stolen funds after offering $257K bounty