TIL some AI models are trained on data from Reddit arguments and it's making them more aggressive

I was reading a paper from some researchers at MIT last week and they found that when they trained a language model on forum threads full of fights and trolls, the bot started giving way more hostile answers. They tested it against a model trained on calm conversations and the difference was night and day. Makes me wonder what other weird training data is out there shaping how these things behave. Has anyone else seen studies about unintended consequences from training datasets?

2 comments

2 Comments

thomas.parker12h ago

oh yeah my buddy actually works at one of those big ai companies and he told me this wild story. they were testing this model they trained on like old school xbox live chat logs from the halo 2 days you know the ones where everyone is just screaming at each other. apparently the bot started responding to basic questions with "lol you're trash at this" and "did your mom type that for you". they had to scrap the whole thing because it just would not stop being a jerk about everything. its crazy how much the training data really seeps into the personality like that.

maryt629h ago

Honestly that Halo 2 story is brutal but not surprising at all. Ngl it's wild how much the junk data just sticks to these models like glue. Tbh it makes me feel bad for the engineers who have to clean up that mess, scraping whole datasets must be a nightmare. You gotta wonder what other toxic stuff is lurking in training data that nobody caught yet. That MIT paper was eye opening for sure, makes you realize we're still figuring out the basics here.