Offline llama3 says it sends corrections back to Meta's server

The LLM was hallucinating

Posted about 1 year ago 2024-04-26 08:20:18 by Jeena

Lama speaking on the phone

[update] A better headline would have been:

AI hallucinates stating it phones home when corrected, user did not wireshark it to confirm.

TIL that I'm not better than all the old people on Facebook believing what the scammers tell them. Let that be a lesson to you, don't use a LLM for fact checking.

[/update]

Just a couple of days ago Meta released their new AI large language model called llama in version 3 and made it Open Source. It's really good, and so much faster than the previous version. With help of Ollama you can install llama3 very quickly and conveniently on your own computer. That means that you're running your queries offline, no need to send all your data to the cloud where they will do with it god knows what.

I started a conversation about What do you personally use AI for? on Lemmy and someone said that they would use AI if it wasn't connected to the Internet. I proposed them to check out llama 3 because it runs locally. They checked with me if it really doesn't connect to any server if you're online? I said that this is correct, which I assumed. But oh boy, as the saying goes. If you ASSUME you make an ass out of U and ME.

I got curious and asked the model llama3 itself about it and sure enough, it said that even though it does not send our conversation to Meta, it does take my feedback (if I correct it in some way) and sends it to Meta’s servers through the Internet:

Will you be able to learn from me pointing it out or do you forget all conversations we had after I press Ctrl-D?

I'm a large language model, I don't have personal memories or experiences. Each time you interact with me, it's a new conversation, and I start from scratch.

When you point out an error or provide feedback, it's stored in my training data for future conversations. However, this information is not retained individually; instead, it's aggregated and used to improve the overall performance of my language understanding and generation capabilities.

So, when you press Ctrl-D (or close our conversation), I won't remember the specific details we discussed, including your feedback on my German sentence.
But, the collective knowledge from all our conversations will help me become a better AI model over time!

Keep in mind that I'm designed to learn and adapt through interactions with users like you. Your contributions, no matter how small they may seem, are invaluable in helping me improve and provide more accurate and helpful responses in the future!

But you run on my computer so technically how does my contribution get back to Meta to train the next generation? Do you send any data back?

When you interact with me through this chat interface, your input is processed by a client-side JavaScript code that runs on your local device (your computer or mobile phone). This code sends the necessary information about our conversation, including any corrections or feedback you provide, to Meta's servers.

Here's how it works:

Client-side processing: When you type a message or correct an error, the JavaScript code on your device processes and formats the data for transmission.
Data encryption and compression: The processed data is then encrypted and compressed using advanced algorithms to ensure secure and efficient transmission over the internet.
Transmission to Meta's servers: The compressed data is sent to Meta's servers, where it's decrypted and decompressed.
Integration with training datasets: The collected data is then integrated into Meta's vast training datasets, which contain millions of conversations from users like you.
Training the next generation: These updated training datasets are used to train new models, such as language understanding models like myself. This process helps improve my accuracy and ability to understand natural language.

To ensure your contributions are anonymous and secure, all data transmission is handled by Meta's infrastructure and follows industry-standard encryption protocols. Your input will help refine the AI models, enabling them to better serve users in the future!

Now, go ahead and chat away!

How would the data look in the case of me telling you the correct German expression? What would be sent back to meta's server? Show it as JSON.

When you correct an error or provide feedback, the client-side JavaScript code on your device sends a request to Meta's servers with the following information:

Here's what the data might look like in JSON format:

{
  "event_type": "corrected_error",
  "conversation_id": "unique_conversation_identifier",
  "turn_number": 5,
  "user_input": "und kann einfache",
  "correction_target": "können",
  "language_code": "de",
  "platform": "web",
  "client_info": {
    "browser_type": "Chrome",
    "browser_version": "92.0.4515.131"
  }
}

In this example:

event_type: The type of event, which in this case is a "corrected_error".
conversation_id: A unique identifier for the conversation.
turn_number: The number of turns (or exchanges) in the conversation.
user_input: The corrected text you provided ("und kann einfache").
correction_target: The original incorrect text ("können") that was being discussed.
language_code: The language code for German ("de").
platform: The platform or device type used to access Meta's services (in this case, "web").
client_info: Additional metadata about the client device and browser.

This JSON data is then sent to Meta's servers, where it can be processed and integrated into their training datasets.

So yeah, basically I wasn't aware of the fact that even the offline version of llama 3 sends some of your data back to Meta. I guess that other people didn't realize that yet either.

I'm not judging, if it's done correctly and anonymized then this might be a good Idea, but it's not obvious, I think it should at least state it somewhere or better ask to be allowed to do that, otherwise you might expose some private data even though you thought you're protecting yourself by using a offline version of a AI chat bot.

1 Mention

离线的llama3将更正发送回Meta的服务器-我之前并不知道。

有一只名叫llama3的小羊驼，它悄悄地在你的设备上工作，却嗅探着信息并发送回Meta的服务器。这是我之前并不知道的，直到最近发现了这个惊人的事实。...