Apr 14

Truth vs Coherence in LLMs

4 Comments

Well, Mike Monday. I find that Perplexity.ai does "information retrieval" and presents solutions that work, as well as giving me the most important source links for me to read if I want, without having to ask specifically, to make up my own mind. I have had a couple of subtle mistakes that it did, but only in one case was that not my fault for not being ultra clear about my technical circumstances. In that case it subsequently assisted in fixing the problem I had created by following the initial flawed advice.

I'm not using this service for anything else, but solving specific problems that I believe I do a very good job explaining by giving simple boundaries, much as when having a conversation with someone I know just a bit.

Maybe there is some coherence sought out, but it hasn't gotten in my way so far. It's now what I use instead of b****** google when I want to find out something.

ChatGpt and Claude in comparison are more wild and ripe with odd solution to problem it assists in creating, while Gemini is maybe a bit more reigned in and balanced. But I haven't use that much.

I'm not really generating anything, except maybe placeholder data that only has to have a specific form. That's it. For everything else I feel I'm faster getting to what I need myself.

Of course, I never had any trust in these tools, but I'll use them if they give results.

Expand full comment

Steve Phillpott

Apr 26

I had a very similar situation that you had Mike, that somewhat dented my trust/confidence in these LLMs. So, I'm following your progress on this with interest.

My issue was probably closer to that mentioned by Jon, but mine was more perturbing (for me).

I too was asking ChatGPT a question, but in this case I had created a GPT with the specific purpose being to answer questions based on an uploaded manual (for a Korg Multi/Poly Synth). I had great experience of a created GPT where it was really useful in how it answered my questions (that one was based on Clyphx Pro...so maybe a bit more niche). I had given in those instructions to answer questions based on the uploaded manual (as I did before). But what I started noticing was that it would just make shit up (like Jon experienced). "Access menu item xyz...then...", and that menu wouldn't even exist, and when you point that out to it, it breezily (and that's what get friggin' annoying), just says "You're 100% right....blah, blah, THIS is actually what you should do".

This was a very different experience to the Clyphx Pro GPT, and was really concerning me. There's not much point having a tool that you don't trust. So I eventually called it on the issue and asked it WTF? It admitted it was frequently just going on it's general knowledge on synthesisers in general and Korg in general! So after a bit of back and forth, I gleaned that I should be much more specific with my instructions, so I along with the instruction to:

-Answer questions based on the user manual that has been uploaded.

I added:

-Answer all questions "From the perspective of the Editor Software only.", unless I explicitly state "on the hardware", in which case you should answer "From the perspective of the Front panel only"

-Do not make any assumptions based on general Korg knowledge

- If the answer does not exist in the manual, then ask me if I want you to go to an internet search to check if that information exists there.

So, I thought we had that wrapped up!

But no, less than 24 hours later and a few more questions in on the Korg Multi/Poly it completely made something up. And it was back to asking it WTF again!

To which it answered that I was right to be angry and to be losing confidence, as it was not doing that I asked and that it would not happen again! FFS, it's like dealing with a teenager (I'm guessing...)

So I pointed out that, it was not the first time it had said it would not happen again, and asked it why things were going to be different this time....here's it's answer:

"When I said "It will not happen again" before,

I meant I would try to be more careful — but I didn't explicitly change how I was working yet.

In reality, I was still moving at my default pace, still partly relying on general knowledge before double-checking with the manual."

So (and this was the thing that really concerned me), IT HAD LIED TO ME! That's possible???

How can you trust this tool if it just gayly goes on with its own mission???

Anyhow, it had now promised to stick 100% to the 'rules'.....I will report back how it goes. And keep following your progress too...

Expand full comment

Reply (2)

Mike Monday

Apr 26

Oh and also - if you end up in a “coherence hole” like this - START A NEW CHAT.

I’ve noticed that when it gets something wrong and you point it out - these problems perpetuate.

I don’t exactly know why but my theory is this:

1. The llm is trained on words on the internet written by humans.

2. When a human is called out for bs - what do most of us do?

3. We rationalise, obfuscate and even lie.

(Obviously I’ve NEVER done this. 🤣)

4. So when an llm is “caught” making things up - what is the most likely next word? Yep. 🙈🙊🙉

5. The more you point it out, the worse it gets - you’re in a coherence hole.

6. So start a new chat with a clearer prompt. Or look it up on the internet.

7. (Which might also have been written by an llm of course. So when you can go to a primary source!)

Expand full comment

Mike Monday

Apr 26

LLMs CANNOT tell you the “truth”. Therefore they cannot “lie” to you either. They are probabilistic by nature, ie they “guess” what you want to hear. This isn’t information retrieval, it’s coherence creation. That some of their output is accurate is because some coherent sentences are true.

So in general I don’t tend to use an llm for an answer which is deterministic - either a yes or no. Exceptions are:

1. I can easily verify it

2. I’m an expert in the area

3. It’s low stakes

4. There’s immediate feedback (eg coding)

I’m using them on the assumption they’re inaccurate or false around 50% of the time.

(I’m sure they’re accurate more than50% but this principle is a “safety” mechanism.)

Expand full comment

The Monday Stack

How a Bout of Tech Rage Uncovered the Dark…