Dangerous Python Functions, Part 2

Updated in 2026 with Python 3 examples and current links.

As mentioned in the first part of this series, some functions in Python can be dangerous if you’re not aware of their risks. In this installment, we’ll cover deserializing data with pickle and yaml and information leakage.

Pickle and friends

Why it’s useful

pickle enables you to store state and Python objects to disk so that you can later restore them. Pickle can be useful for storing something that doesn’t quite need a database or for data that’s inherently temporary.

In the past, I’ve used pickle to support pause and resume functionality for large file transfers. I saved the progress to a pickle file and then, on resume, picked up where it left off and removed the pickle.

Why it’s dangerous

Pickle has the same weaknesses as exec and eval, which we covered in part 1. It enables users to craft input that executes arbitrary code on your machine. Sound familiar?

Other modules that rely on pickle inherit the same risks. shelve, for example, uses pickle under the hood for serialization.

Popular frameworks have learned this lesson over the years. Celery used pickle by default for worker communication before version 3.0.18, and Django used it for session storage before version 1.6. Both have since moved to safer defaults, but the underlying risk remains for any code that deserializes untrusted pickle data.

A dangerous example

I’m going to use an example from Lincoln Loop’s Playing with Pickle Security and expand upon it. In our example, we will serialize a command to call the command-line utility ls and deserialize it with pickle.loads().

import os
import pickle


# Exploit that we want the target to unpickle
class Exploit:
    def __reduce__(self):
        # Note: this will only list files in your directory.
        # It is a proof of concept.
        return (os.system, ('ls',))


def serialize_exploit():
    shellcode = pickle.dumps(Exploit())
    return shellcode


def insecure_deserialize(exploit_code):
    pickle.loads(exploit_code)


if __name__ == '__main__':
    shellcode = serialize_exploit()
    print('Yar, here be yer files.')
    insecure_deserialize(shellcode)

In this case, we only wanted to list the files in the directory using the ls command. We could have used almost any shell command.

What to use instead

You could use json to serialize data or, if you must, yaml. If you use yaml, please read the section below on why it has its own set of risks.

If you’re using Celery or Django, make sure you’re on a modern version that does not use pickle for serialization by default.

If you must use it…

Be careful with your input! Never trust a pickle that has gone over the network or come from someone else. It’s too easy to exploit.

Additional references

Loading YAMLs

Why it’s useful

YAML files offer another option for serializing and deserializing data. They are useful for storing configuration or other immutable values. I have used YAMLs to store configuration values for web applications, where the configuration differs depending upon the environment we’re deploying to (production vs staging, for example).

PyYAML does not live in the standard library but seems like the most popular way to parse YAMLs in Python.

Why it’s dangerous

The simplest way to load a YAML file used to be yaml.load(). Unfortunately, yaml.load() without a Loader argument is an unsafe operation that, you guessed it, enables maliciously crafted files to execute arbitrary code on the host machine.

A dangerous example

As with pickle, we’ll setup an example where we read the files in a directory on the host machine.

In exploit.yml:

your_files: !!python/object/apply:subprocess.check_output ['ls']

In a Python script (after perhaps running pip install pyyaml):

import yaml

with open('exploit.yml') as exploit_file:
    contents = yaml.load(exploit_file)
    your_files = contents['your_files'].splitlines()
    for your_file in your_files:
        print(your_file)

Again, we can provide many different commands to subprocess, including those that we discussed in part 1.

What to use instead

The yaml module has a safe way to load yaml files: yaml.safe_load(). When I originally wrote this post, I wished the package had the safe method as the default. As of PyYAML 6.0, calling yaml.load() without an explicit Loader raises an error, which is a good step.

As Ned Batchelder said at the time:

Why do serialization implementers do this? If you must extend the format with dangerous features, provide them in the non-obvious method. Provide a .load() method and a .dangerous_load() method instead. At least that way people would have to decide to do the dangerous thing.

PyYAML eventually took that advice to heart.

If you must use it…

Use yaml.safe_load(). If you must use yaml.load() directly, pass Loader=yaml.SafeLoader explicitly so your intent is clear.

Additional references

A few more dangers

A few more things to keep in mind.

SQL Injection

SQL Injection is basically untrusted input meets your database. All the same risks that we talked about with untrusted input above also apply here.

As a quick example, here’s how someone could exploit this:

import sqlite3

def get_user_by_name(name, cursor):
    cursor.execute("SELECT * FROM users WHERE name = '%s'" % name)  # unsafe!


if __name__ == '__main__':
    conn = sqlite3.connect('example.db')
    cursor = conn.cursor()
    malicious_name = "Joe'; DROP TABLE users; --"
    get_user_by_name(malicious_name, conn)

If you ran this example against a real database, the malicious name would drop the user’s table. Not great.

Python provides a database binding for sqlite3 in the standard library and there’s a section in the Python docs where they talk about how to properly escape variables (which we do not do in the example). Otherwise, I’d recommend using an ORM, such as the one in Django or sqlalchemy.

OWASP SQL Injection

Information Leakage

The print function and logging module are useful but potentially risky. Ideally, any log files that we write have their permissions configured to allow only sufficiently privileged users to read them.

If anyone can read the log file, it’s easier for someone to access the logs when they shouldn’t be able to do so. If you must log sensitive information (must you?), be sure to protect it through access controls.

Thanks to @goodwillbits for recommending that I add this section.

In Conclusion

This series covered several ways Python functions can bite you. Python’s documentation flags the risks, but you have to know where to look.

If you take one thing from these posts: never trust untrusted input.

Discussion on Hacker News.