Understanding Temperature in Language Models (LLMs)

4 min readJan 20, 2025

Control the creativity and randomness of text generation.

What is Temperature in LLMs?

Temperature is a parameter that determines the probability distribution of the next word in text generation. It acts as a “creativity knob,” controlling the randomness of the output:

Low Temperature (e.g., 0.0–0.4): Results are deterministic, precise, and repetitive.
Moderate Temperature (e.g., 0.5–1.0): Outputs become more diverse while maintaining coherence.
High Temperature (e.g., 1.1–2.0): Text becomes creative and experimental, often at the cost of relevance and focus.

Here’s a Python example that demonstrates this:

import numpy as np
import matplotlib.pyplot as plt
def softmax_with_temperature(logits, temperature):
    logits = np.array(logits)
    exp_logits = np.exp(logits / temperature)
    return exp_logits / np.sum(exp_logits)
# Raw scores for 5 tokens
logits = [2.0, 1.0, 0.1, -1.0, -2.0]
temperatures = [0.2, 0.5, 1.0, 2.0]
# Calculate probabilities for each temperature
probabilities = {temp: softmax_with_temperature(logits, temp) for temp in temperatures}
# Plotting
for temp, probs in probabilities.items():
    plt.plot(range(1, len(probs) + 1), probs, label=f'Temperature = {temp}')
    
plt.title("Effect of Temperature on Softmax Probability Distribution")
plt.xlabel("Token Index")
plt.ylabel("Probability")
plt.legend()
plt.grid()
plt.show()

What The Numbers Mean

Let’s look at the probabilities for different temperatures:

Temperature = 0.2 (Very Cold)

Token 1: 99.3%
Token 2: 0.67%
Other tokens: nearly 0% This makes the model very focused on the highest-scoring token.

Temperature = 2.0 (Very Hot)

Token 1: 42.5%
Token 2: 25.8%
Token 3: 16.4%
Token 4: 9.5%
Token 5: 5.8% This spreads out the probabilities, making the model more random.

Visual Explanation

The plot shows how temperature affects token selection:

Blue line (T=0.2): Very steep, almost all probability goes to the highest-scoring token
Orange line (T=0.5): Still favors high-scoring tokens but less extremely
Green line (T=1.0): Moderate distribution
Red line (T=2.0): Flatter distribution, giving more chances to lower-scoring tokens

Practical Examples

Think of it like this:

Low temperature (0.2): “The sky is blue” → Always picks the most common completion
Medium temperature (1.0): “The sky is…” → Might pick “blue,” “cloudy,” or “clear”
High temperature (2.0): “The sky is…” → Might pick more creative options like “dancing,” “infinite,” or “a canvas”

The higher the temperature, the more willing the model is to take “creative risks” with its responses.

Example of how temperature works in text generation (Using GROQ API)

Below is a simple python code which uses GROQ API for accessing LLAMA 3.3 70B parameter model.

import numpy as np
from groq import Groq
def generate_completion(client, model, prompt, temperature):
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=100,
        top_p=1,
        stream=False,
        stop=None,
    )
    return completion.choices[0].message.content

def main():
    client = Groq(api_key="your api key")
    model = "llama-3.3-70b-versatile"
    prompt = "Explain AI in exactly 50 words. Do not exceed this limit."
# Generate temperatures from 0 to 2 with a step of 0.1
    temperatures = np.arange(0, 2.1, 0.1)  
    for temp in temperatures:
        print(f"\nTemperature: {temp:.1f}")
        result = generate_completion(client, model, prompt, temp)
        print(result)
#divider line 
        print("-" * 50)
if __name__ == "__main__":
    main()

Comparing Outputs at Different Temperatures:

Practical Use Cases of Temperature are:

Conclusion

Temperature in LLMs plays a crucial role in tailoring the tone, creativity, and coherence of generated text. By adjusting this parameter, users can achieve outputs ranging from precise technical content to highly creative narratives. Understanding and leveraging temperature effectively can optimize text generation for specific applications, enhancing the value of tools like Groq API.

Thank you for reading to the end! Here’s a refined version of your statement: I recently developed an application for resume analysis and generation using an advanced LLM (LLAMA-3.3, 70B versatile). This tool leverages prompt engineering and the GROQ API (with the temperature parameter) to analyze resumes and job descriptions. It generates detailed analysis reports and enhances resumes, providing tailored suggestions to align with job requirements effectively.

You can check out the app at: Resume_easz

— — — You can connect me on LinkedIn — — —