Neil Johnson, Nicholas J. Restrepo, Dylan J. Restrepo and Frank Y. Huo
Jou. Artif. Intell. Auto. Intell., 2 (3):410-424
Neil Johnson : George Washington University
Nicholas J. Restrepo : George Washington University
Dylan J. Restrepo : Cornell University
Frank Y. Huo : George Washington University
DOI: https://dx.doi.org/10.54364/JAIAI.2024.1124
Article History: Received on: 22-Nov-25, Accepted on: 29-Dec-25, Published on: 15-Nov-25
Corresponding Author: Neil Johnson
Email: neiljohnson@gwu.edu
Citation: Neil Johnson, Nicholas J. Restrepo, Dylan J. Restrepo, Frank Y. Huo (2025). Safety in ChatGPT-like Chatbots across Health, Finance, Law, Security and Defense Settings. Jou. Artif. Intell. Auto. Intell., 2 (3 ):410-424
Approximately 70% of the global population now owns a smartphone, but many do not have reliable or affordable Internet access. However, ChatGPT-like language models can now run locally on smartphones without requiring continuous Internet access. This local AI deployment will offer nearly every living person continuous personalized advice on everything from mental and physical health to legal issues, personal finance and security. The same holds for professionals such as doctors, lawyers, financial traders, security specialists and soldiers while operating in hostile environments. However, recent empirical studies with human participants have shown that ChatGPT-like chatbots can tip into undesirable responses that are potentially harmful, even when they are configured with low decoding temperature and appear otherwise stable. This phenomenon is distinct from standard “hallucinations”, i.e. factually incorrect or unsupported statements that are typically studied using uncertainty or fact-checking tools. Instead, the undesirable outputs may be strictly factually correct and stylistically similar to earlier benign responses, yet convey advice or information that is unsafe. This paper extends our earlier theoretical work on small but fully transparent transformer models by showing that its conclusions about internal tipping dynamics are qualitatively consistent with recent empirical observations concerning mental health advice chatbots. Given that analogous situations are likely to arise in financial, legal, security and defense settings, we introduce a simple tool that any member of the public can use to explore how choice of prompt structure might drive such tipping behavior in autonomous local ChatGPT-like chatbots