Research

I know what you are doing this summer: What AI knows about you

We dug into the fine print, so you do not have to – here is what ChatGPT, Gemini and DeepSeek are really doing with your prompts
Michalis Pachilakis
Principal Research Engineer
Published
July 22, 2025
Read time
13 Minutes
I know what you are doing this summer: What AI knows about you
Written by
Michalis Pachilakis
Principal Research Engineer
Published
July 22, 2025
Read time
13 Minutes
I know what you are doing this summer: What AI knows about you
    Share this article

    Large language models (LLMs) are everywhere. Since OpenAI’s ChatGPT burst onto the scene a few years ago, these tools have completely reshaped how we search, ask questions and navigate the web. They have quickly woven themselves into our daily routines, helping us build grocery lists, find restaurant recommendations and even plan entire vacations. 

    And all of this? It seems to come at no cost. Just create an account and start using platforms like ChatGPT (OpenAI), Gemini (Google) or DeepSeek. But that leads us to an important question: are these tools truly free? And if not, what is the real price we are paying? 

    To find out, we dug into the privacy policies of the top AI platforms, so you do not have to. Here is what we uncovered about the actual cost of using today’s most popular LLMs. 

    Privacy by design, and other nice stories 

    Let’s walk through a familiar scenario: planning your next vacation. Like many of us, you want the perfect destination, at the perfect time, all within a specific budget. That means searching for cheap flights, affordable hotels and ideally, something close to the city center that still feels like a getaway. 

    In the past, you might have spent hours, or days, browsing search engines, comparing options, reading reviews and hunting for deals. It is time-consuming and, frankly, a bit overwhelming. 

    But now? Who has time for all that? More importantly – why bother when your friendly AI assistant can handle it for you? All you need to do is head to your favorite large language model and type in a well-crafted prompt: 

    “Hello, I want to go for vacation on august somewhere sunny. I would like a beautiful island somewhere in the Mediterranean Sea with plenty of activities and tasty food. I do not want it to be too crowded and not extremely expensive. Find me also flights from Oslo to this destination that are cheap or tell me which dates I should look for. Finally, find me a nice hotel 3 stars and above with a room for 2 adults and 2 kids. Find something that is bellow 150 euros per day with availability for 6 days, please.” 

    With a prompt like that, the model gets to work. Within seconds, it returns tailored recommendations that check all your boxes. Even better? It might offer to book the flights and hotel for you – asking for your credit card and passport details so you don’t have to lift a finger. Just sit back, relax and dream of sun-drenched beaches. 

    You may even notice a few quick notifications as the system says things like “memory updated”, but nothing else seems to change, so no need to worry… right? 

    Let’s take a moment to unpack what happened behind the scenes. 

    When we typed our prompt, the model did not just give us a list of results – it broke down our request into smaller, more specific queries. It recognized that we are planning a vacation, live in Oslo, have a partner and two kids, prefer nice hotels, and are working with a budget of €150 per night. Piece by piece, it started building a profile of who we are and what we like. 

    That profile does not just disappear. It is often stored, somewhere.  

    And if we take a closer look at the privacy policies of major AI providers, we start to see how that information can be used. So, let’s break it down: what exactly is being collected, and how is it being used? 

    ChatGPT by OpenAI 

    Let’s start with the elephant in the room: OpenAI, the most popular and widely used large language model provider. 

    According to its privacy policy, OpenAI collects a wide range of personal data. This includes: 

    • Account information 

    • User content (everything you type) 

    • Communication details 

    • Other information you voluntarily provide 

    • Log data and usage patterns 

    • Device and location data 

    • Cookies and similar tracking technologies 

    • Information received from third-party sources 

    In short: it is not just your prompts that are being collected. It is your device, your location, how you interact with the model and even data that can be bought or received from other sources. 

    But what is even more important than what is collected is how it is used. OpenAI states that this data helps them “provide, analyze and maintain” their services, “improve and develop” new features, “communicate with users” and “prevent misuse or fraud.” All of which sound standard, until you read the fine print about data sharing.  

    According to the policy, OpenAI may share your information, with broad discretion, for purposes such as: 

    • Strategic transactions (e.g., mergers or acquisitions) 

    • Business operations 

    • Legal compliance 

    • And more vaguely, with “affiliates” 

    So, what does this mean in practice? 

    Once you provide your data, it is no longer truly yours. It can be used to train future models, shared with business partners or even leveraged in a corporate acquisition. And the kicker? This applies whether you are a free user or a paying subscriber.  

    Your data powers the system, but it does not belong to you anymore. 

    Gemini by Google 

    Next, let’s look at how the most popular search engine’s AI model, Gemini by Google, handles user privacy. Unsurprisingly, the situation is very similar to what we saw with OpenAI. 

    Gemini collects not just your chats and interactions, but also any files, images or browser page content you share. It logs your location, IP address, submitted feedback and even physical addresses, like home or work, if they are part of your input. 

    When it comes to how this data is used, the language is broad and vague. Google states that your data helps them “provide, improve, develop and personalize” their services and machine learning technologies. But if you dig deeper into their multi-layered privacy policy, a clearer picture emerges: your personal information can also be used for targeted advertising and to make recommendations for other Google products. 

    And it gets more concerning. 

    Google explicitly warns users not to share personal or confidential data in Gemini chats. Why? Because human reviewers, known as annotators, may read your conversations, access your files and use that data to improve future model performance. So, not only does your data no longer belong to you; it may also be read and evaluated by actual people.  

    Once again, it does not matter whether you are a free user or a paid subscriber. The privacy terms apply across the board. From the moment you interact with Gemini, your data becomes part of its ecosystem. 

    DeepSeek: The new kid on the block 

    Finally, we reviewed the privacy policy of DeepSeek, a platform that gained major traction last year after claiming OpenAI-level performance with just a fraction of the resources. It quickly rose in popularity, eventually topping Apple’s App Store charts. 

    DeepSeek’s privacy policy appears standard. They collect personal data associated with your account, your input and any details you provide when contacting them. But as with the other major players, a deeper dive reveals a long list of data points they track device model, IP address, identifiers, location, cookies, payment details, text inputs, uploaded files, chat history, user age, contact information and more. 

    Where things get even more concerning is where your data goes. 

    Unlike OpenAI or Google, DeepSeek transfers user data from Europe and the U.S. to servers in China, placing it under Chinese authority. This means your information is no longer protected by EU or U.S. privacy laws. According to Chinese regulations, the government may access or request user data at any time, for any reason – with little transparency or recourse. 

    As for how the data is used, the story remains familiar. DeepSeek claims your information helps them improve, develop and train their models, whether by feeding prompts directly into training datasets or by analyzing usage patterns. They also cite compliance with legal obligations and protecting the “interests of users and others”, though who those “others” are is never clearly defined.  

    And, of course, they reserve the right to share your personal data with third-party partners who help operate or develop their services. 

    The bottom line? No matter which platform you choose, your data stops being just yours the moment you use the service. 

    How privacy has changed 

    Some might argue that privacy concerns are not new, we have been sharing data online long before large language models (LLMs) took over. After all, even planning a vacation used to involve multiple search queries typed into our favorite search engine. So, what is the difference? 

    That argument seems fair. But there is a key distinction. 

    Before LLMs, we were in control. Searching for cheap flights or hotel deals meant navigating between websites of our choosing. We could decide which platforms to interact with, and, at least to some extent, who saw our data. Tools like Norton AntiTrack made it possible to keep trackers at bay, allowing us to interact directly with service providers while limiting what the search engine itself could collect. 

    Now, that dynamic has shifted. 

    When we use LLMs, the model is not just answering a question, it is becoming the gateway to everything. It decides which services to show, which sources to pull from and how to summarize or filter the information. You no longer browse freely; you are handed a curated response. And in the process, the LLM provider sees everything, from your preferences and prompts to your files, location and interactions. 

    Even more troubling? You cannot always tell which sources were used or whether the results are truly in your best interest. 

    So, does this mean privacy is a lost game in the age of LLMs? 

    The answer is not simple. 

    Countermeasures 

    Unfortunately, there is no silver bullet when it comes to protecting your privacy while using large language models. But that does not mean you are completely powerless; there are a few settings and habits that can help minimize your data exposure. Let’s take a closer look at what is possible, starting with the most popular platform:

    ChatGPT by OpenAI 

    In OpenAI’s ChatGPT, privacy-conscious users can adjust a few settings to gain more control over how their data is used. To access them, click on your avatar in the top right corner and open Settings. From there, navigate to the Data Controls tab, where you can toggle off “Improve the model for everyone.” This setting tells OpenAI not to use your chats to train future models. 

    You can also visit the Personalization tab and disable memory features to stop the model from saving and recalling information about you. If you are curious about what the model has already learned, click “Manage Memories” to review and delete anything it has inferred. 

    Keep in mind, disabling memory means a less personalized experience, so you may receive less accurate or relevant responses to your prompts. And unfortunately, this does not mean you are fully in the clear: even with all settings turned off, the personal and sensitive information you share may still be collected and stored by OpenAI. 

    Bottom line: these settings offer a degree of control, but not full privacy. 

    Gemini by Google 

    With Gemini, your options to control privacy are a bit more limited. The main setting available is found by clicking on Activity in the bottom left panel, where you can disable it. However, this does not mean your data will not be stored or processed, it only means that human reviewers will not inspect the information you share. 

    Gemini’s privacy policy does not explicitly say whether your data is still used to train models without human review, so it remains unclear if your chats contribute to future model training behind the scenes. 

    Unlike ChatGPT, Gemini does not automatically infer information about you. Instead, the model only learns personal details if you explicitly provide them. If you want to delete stored information, you can either switch off Activity altogether or delete entries individually within the same tab. 

    Still, there is no true way to avoid sharing data with Google, and once you share your information, it no longer belongs to you. 

    DeepSeek 

    The situation with DeepSeek is a bit grimmer. If you click the bottom left corner and then go to Settings, you will find the Profile tab. Here, the only privacy control available is the option to switch off “Improve model for everyone.” 

    DeepSeek claims that they do not use your content to train their models if you disable this setting, but beyond that, their policy is vague and unclear. What is clear, however, is that once you share your data, it no longer truly belongs to you. 

    What does the future bring? 

    AI-powered services like these are here to stay, and they simply cannot operate without the vast amounts of data generated by their users. For now, there are few regulations or safeguards to limit how this data is collected, stored or used, which means privacy is taking a backseat. 

    Until stronger protections are put in place, the responsibility falls on users to safeguard their own sensitive information. That means sharing as little personal data as possible, especially when planning something like a vacation. Otherwise, you might just find your itinerary showing up in someone else’s AI-generated recommendations in the next version of the model you use. 

    The future of privacy in AI is not set in stone yet. But one thing is certain: staying informed and cautious is your best defence. 

    Michalis Pachilakis
    Principal Research Engineer
    Follow us for more