ISSN :3049-2297

Safety in ChatGPT-like Chatbots across Health, Finance, Law, Security and Defense Settings

Original Research (Published On: 15-Nov-2025 )
DOI : https://dx.doi.org/10.54364/JAIAI.2024.1124

Neil Johnson, Nicholas J. Restrepo, Dylan J. Restrepo and Frank Y. Huo

Jou. Artif. Intell. Auto. Intell., 2 (3):410-424

Neil Johnson : George Washington University

Nicholas J. Restrepo : George Washington University

Dylan J. Restrepo : Cornell University

Frank Y. Huo : George Washington University

Download PDF Here

DOI: https://dx.doi.org/10.54364/JAIAI.2024.1124

Article History: Received on: 22-Nov-25, Accepted on: 29-Dec-25, Published on: 15-Nov-25

Corresponding Author: Neil Johnson

Email: neiljohnson@gwu.edu

Citation: Neil Johnson, Nicholas J. Restrepo, Dylan J. Restrepo, Frank Y. Huo (2025). Safety in ChatGPT-like Chatbots across Health, Finance, Law, Security and Defense Settings. Jou. Artif. Intell. Auto. Intell., 2 (3 ):410-424


Abstract

    

Approximately 70% of the global population now owns a smartphone, but many do not have reliable or affordable Internet access. However, ChatGPT-like language models can now run locally on smartphones without requiring continuous Internet access. This local AI deployment will offer nearly every living person continuous personalized advice on everything from mental and physical health to legal issues, personal finance and security. The same holds for professionals such as doctors, lawyers, financial traders, security specialists and soldiers while operating in hostile environments. However, recent empirical studies with human participants have shown that ChatGPT-like chatbots can tip into undesirable responses that are potentially harmful, even when they are configured with low decoding temperature and appear otherwise stable. This phenomenon is distinct from standard “hallucinations”, i.e. factually incorrect or unsupported statements that are typically studied using uncertainty or fact-checking tools. Instead, the undesirable outputs may be strictly factually correct and stylistically similar to earlier benign responses, yet convey advice or information that is unsafe. This paper extends our earlier theoretical work on small but fully transparent transformer models by showing that its conclusions about internal tipping dynamics are qualitatively consistent with recent empirical observations concerning mental health advice chatbots. Given that analogous situations are likely to arise in financial, legal, security and defense settings, we introduce a simple tool that any member of the public can use to explore how choice of prompt structure might drive such tipping behavior in autonomous local ChatGPT-like chatbots

Statistics

   Article View: 152
   PDF Downloaded: 4