import styled from "@emotion/styled";

import ComparisonDisplay from "../components/ComparisonDisplay";

import { JSONFormatted1Shot } from "../data/LookingAtFormatting/JSONFormatted1Shot";
import { SimpleFormatted1Shot } from "../data/LookingAtFormatting/SimpleFormatted1Shot";
import { JSONFormatted3Shot } from "../data/LookingAtFormatting/JSONFormatted3Shot";
import { SimpleFormatted3Shot } from "../data/LookingAtFormatting/SimpleFormatted3Shot";
import "bootstrap/dist/css/bootstrap.min.css";

const LookingAtFormatting = () => {
  return (
    <OuterContainer>
      <Container>
        <Title>Prompt Structure for Guiding LLMs</Title>
        <IntroSection style={{ marginBottom: "0" }}>
          Over the last several years, access to large language models for
          generation tasks has become widespread. Anyone with a simple internet
          connection can now access giant models from companies such as{" "}
          <a href="https://www.goose.ai">Goose</a>,{" "}
          <a href="https://openai.com">OpenAI</a>,{" "}
          <a href="https://cohere.ai/">Cohere</a>, and{" "}
          <a href="https://www.ai21.com/studio">AI21</a> with the click of a
          button. As access to these models becomes ubiquitous, it's probably
          useful to know how to direct them to generate you want them to get.
          The method of getting a trained model to generate text for you is
          called prompt engineering, which is what this post is about.
        </IntroSection>
        <IntroSection style={{ marginTop: "0" }}>
          There's a million papers and blog posts on how transformers work but
          it's worth mentioning that the current trend in language models is
          based on the transformers architecture from{" "}
          <a href="https://arxiv.org/abs/1706.03762">
            Attention is All You Need
          </a>{" "}
          which was shown to scale up nicely in{" "}
          <a href="https://arxiv.org/abs/2005.14165">
            Language Models are Few-Shot Learners
          </a>{" "}
          which are worth reading but the techinical details are pretty
          irrelevant for this blog post.
        </IntroSection>
        <SectionHeader>What are Prompts</SectionHeader>
        <TextSection>
          When interacting with large language models (LLMs), you give the model
          some text, called a <i>prompt</i> (or <i>context</i>) for which the
          model generates some text which is called a <i>completion</i>. For
          instance, if you give a LLM a prompt of{" "}
          <CodeBlock>1 + 1 = </CodeBlock>, the model will generally complete
          your prompt with an output, .<CodeBlock>2</CodeBlock>. Similar, when
          you give the AI a prompt of{" "}
          <CodeBlock>The capital of France is </CodeBlock>, the model will
          complete it with <CodeBlock>Paris</CodeBlock>.
        </TextSection>

        <TextSection>
          What the model is doing under the hood from those papers we skipped is
          converting those sequences of characters, e.g.{" "}
          <CodeBlock>The capital of France</CodeBlock> into arrays of integers
          called <i>tokens</i> over which the model actually computes such as{" "}
          <CodeBlock>[42, 12, 54, 12, 11]</CodeBlock> that represent the
          characters depending on how the model was trained. It's then prediting
          a sequence of output integers such as{" "}
          <CodeBlock>[51, 55,12]</CodeBlock> which are then converted back into
          characters, e.g. <CodeBlock>Paris</CodeBlock>. Depending on the size
          of the model, you can get between 1,000-8,000 of these tokens between
          the prompt and completion.
        </TextSection>
        <TextSection>
          Now, this works fine if you're just having the model generate simple
          trivia questions. Depending on how large the model is and some other
          issues with training, it may or may not be able to get simple trivia
          or math questions right or wrong just based on the implicit
          information that the model learned in training. However, if we're
          trying to do more complex tasks, that's where things get a little more
          complicated and where this post comes in.
        </TextSection>
        <SectionHeader>
          The Limitations of Prompts on Complex Tasks
        </SectionHeader>
        <TextSection>
          There's a couple major limitations on these models when it comes to
          both implicit and explicit information. Implicit information is data
          that's either implied or not even contained in the prompt (e.g.
          consider <CodeBlock>Everyone is sitting down</CodeBlock>; that
          generally just refers to a group of people in an area that isn't
          contained in the context. We could explicily include that it's
          everyone in, say, the room, that's sitting down{" "}
          <CodeBlock>Everyone in the room is sitting down</CodeBlock>, but it's
          still unclear what the room is that the prompter is referring to.
        </TextSection>
        <TextSection>
          In the earlier example of{" "}
          <CodeBlock>The capital of France is</CodeBlock>, the model was able to
          pull the implict information of <CodeBlock>France</CodeBlock> out of
          the data it was trained on (hopefully). However, let's say that we're
          writing about the mid-1700s when it was Versailles. In this case, we
          could state that the year is 1750 and hope the model can infer that.
          Otherwise, we can feed the model information such as{" "}
          <CodeBlock>
            In 1750 the capital of France was Versailles. It is now 1750.
            <br /> The capital of France is
            <br />
          </CodeBlock>
          However, this is kind of a pain and in general if we're doing anything
          interesting we're relying on the model to retrieve as much implicit
          information on its own given limited explicit information (Both due to
          having to search for relevant information as well as the limitation of
          the prompt size mentioned earlier).
        </TextSection>
        <TextSection>
          Things get more complicated when you're trying to have the model
          simultaneously keep track of multiple pieces of information. Part of
          that is because the model needs to be able to <i>chunk</i> the data
          into pieces that it is able to compute. People have way more
          sophisticated brains than these models and people have a hard time
          reading walls of text; so do models. Additionally, these models aren't
          too smart so even if the information is explicitly there the model
          won't necessarily pick up on the implicit connection to whatever it's
          trying to complete. This is a long way of saying we can structure the
          data to try to help the model out with knowing what pieces connect
          where.
        </TextSection>

        <SectionHeader>Comparing One-Shot JSON vs. Simple</SectionHeader>
        <TextSection>
          So I start off by seeing if I can get a LLM to generate a snippet
          about a topic with a single example.
        </TextSection>

        <TextSection></TextSection>
        <div style={{ position: "relative" }}>
          <ComparisonDisplay
            datafile1={JSONFormatted1Shot}
            datafile2={SimpleFormatted1Shot}
          ></ComparisonDisplay>
        </div>
        <SectionHeader>Trying with a Three-Shot JSON vs Simple</SectionHeader>
        <TextSection>
          So that was with one example, now I went ahead and tried again giving
          it three examples.
        </TextSection>

        <div style={{ position: "relative" }}>
          <ComparisonDisplay
            datafile1={JSONFormatted3Shot}
            datafile2={SimpleFormatted3Shot}
          ></ComparisonDisplay>
        </div>

        <SectionHeader style={{ marginTop: "20px" }}>So what</SectionHeader>
        <TextSection>
          So there's still a lot of work to be done here on figuring out how to
          get these models to do what you want to do. I think there's some
          pretty cool things to do on generating lots of possible texts and
          using classifiers to figure out which ones match what you want.
          However, that's another question. So for now, I'm gonna stick with
          JSON until I figure out a better way to do this.
        </TextSection>
        <SpacerDiv />
        <SpacerDiv />
      </Container>
    </OuterContainer>
  );
};

export default LookingAtFormatting;

const OuterContainer = styled.div`
  display: flex;
  flex-direction: column;
  justify-content: center;
  align-items: center;
  width: 100%;
  max-width: 100%;
`;

const Container = styled.div`
  height: 100vh;
  width: 750px;
  max-width: 100%;
  display: box;
  padding: 10px 30px 30px 30px;
`;

const Title = styled.div`
  font-size: 30px;
  font-weight: bold;
  margin-bottom: 20px;
`;

const TextSection = styled.div`
  display: block;
  align-items: left;
  margin-bottom: 20px;
`;

const CodeBlock = styled.code`
  align-items: left;
  margin-bottom: 20px;
`;

const IntroSection = styled.div`
  display: block;
  align-items: left;
  margin-bottom: 20px;
  //gray
  background-color: #f0f0f0;
  padding: 10px;
`;

const SectionHeader = styled.div`
  font-size: 20px;
  margin-bottom: 10px;
`;

const SpacerDiv = styled.div`
  height: 20px;
`;
