Installation

Using pip

PyPI install with pip:

pip install file-or-name

From source

To install from the source, clone the github repository and install with pip.

git clone https://github.com/blester125/file-or-name.get
cd file-or-name
pip install .

Local Development

If you want to install the package and run tests install the optional testing dependencies. You can use the -e option to install in “editable” mode so that changes you make in the source code will get used without re-installing the package.

pip install -e .[test]

Run the tests with pytest.

pytest

Set up pre-commit hooks to autoformat your changes with black.

pip install pre-commit
pre-commit install

Building the Docs

To build the documentation locally install the documentation requirements and run make.

pip install -r requirements-docs.txt
cd docs
make html
open build/html/index.html

API

file_or_name

file_or_name.file_or_name.file_or_name(function, **kwargs)[source]

Transparently allow arguments to be either strings or open files.

Note

If there are no kwargs it is assumed the first argument is a file that will be opened in read mode "r"

Note

If you need a file to be opened with extra arguments, for example newline='' for files that will be used with the stdlib csv writer, you should manually open the file and pass the resulting file object in.

Parameters
  • function (Callable) – The user defined function that we will manager the file opening for.

  • kwargs (Dict[str, str]) – The parameters of function that are considered files to be opened. these are interpreted in the for name=mode and is used to create a mapping of parameters whose values should be opened as file with the provided mode. For example if the value of the wf parameter to your function should be opened in write mode you should decorate your function with @file_or_name(wf='w')

Returns

A decorated function where specified arguments are interpreted as file names and opened automatically.

Return type

Callable

file_or_name.utils

file_or_name.utils.parameterize(function)[source]

A decorator for decorators that allow them to be called without parentheses if no kwargs are given.

Parameters

function (Callable) – A decorator that we want to use with either @function or @function(kwarg=kwvalue)

Returns

A decorated decorator that can be used with or without parentheses

Return type

Callable

file_or_name.utils.get_first_parameter(function)[source]

Get the name of the first parameter of a function.

Parameters

function (Callable) – The function whose first parameter name we want.

Returns

The name of the first parameter.

Return type

str

class file_or_name.utils.ShadowPage(path, mode='wb', dir=None, encoding=None)[source]

Store updates to a copy of the output file and swing pointers for an atomic write.

Note

In some environments like Kubernetes this is not safe to use when the file we are shadowing is in a PersistentVolumeClaim. The temporary file lives in the containers file system and in some configurations Kubernetes will block copying files from the container local file system into the PVC file system.

Parameters
  • path – The file that we are shadowing.

  • mode – The mode we should open the shadow file in.

  • dir – The directory in which the shadow file should be created.

  • encoding – The file type encoding the shadow file should be opened in.

write(*args, **kwargs)[source]

Proxy writes to the temp file.

Module contents

Getting Help

If you run into trouble be sure to check the issues on github. Please check if someone else was having the same problem as you but if none of the fixes apply to you feel free to open a new issue.

File Or Name

PyPI Version Actions Status Code style: black Documentation Status

Transparently handle input parameters that are either strings or pre-opened file objects.

Why?

when writing a function that reads or writes data to a file you often end up with something that looks like this

def read_my_cool_file(file_name):
    with open(file_name) as f:
        # Process file object f
        ...

This has some problems.

  1. It couples your data processing code and the opening of the file. This makes it harder to test. The thing you are actually testing is the code that processes the but with code like this you need to cooridnate the opening of the file during the test too. You need to either create fake data on dist of patch the open call.

  2. It can’t handle special files. If you have file in your special format but it uses latin-1 encoding instead of ascii how can you use that file? You can’t. The opening of the file is sealed inside the function meaning the user can’t easily change the behavior. Practices like this force file interaction to only be done in one way.

For maximum flexibility and easy testability you probably want a function that looks like this

def read_my_cool_file(f):
    # Process file object f
    ...

This is nice because when testing you can use things like the io.StringIO objects to dynamically create test data. You can also open files with different encodings and pass them in to get processed just like normal. This is akin to dependency injection scheme where the creation of the thing to be processed in done outside of the process it self to allow for swapping the exact format of the object. There is a usability draw back though. This way of processing files is onerous on the user. It turns single function calls into multi-line calls. This

data = read_my_cool_file("/path/to/my/imporant/data")

becomes this

with open("/path/to/my/important/data") as f:
    data = read_my_cool_file(f)

Functions like this are also a divergence from a lot of other functions a user probably uses. Forcing the user to do things differently for your library is a sure fire way to reduce adoption.

We need a way to accept both file paths (as strings) and file objects without having to write code to check which it is for every io function we write.

What?

Enter file_or_name.

file_or_name introduces a decorator @file_or_name that solves this issue for us.

By decorating a function with @file_or_name we can accept both strings and file objects. Our example above becomes

@file_or_name
def read_my_cool_file(f):
    # Process file object f
    ...

As the writer of the function we can write functions that assume they always get a file object as input. This means we can stop opening files inside functions which makes them easier to test.

As a user we can pass in either a path to a file (as a string) making the function easy to call, or we can pass in an open file object which lets us control exactly how the is opened (control encoding and whatnot).

Usage

The @file_or_name decorator will automatically open and close files when specified parameters have strings as their argument value. If you use the decorator with no arguments it will open the first argument as a file in read mode.

from file_or_name import file_or_name

@file_or_name
def read_json(f):
    return json.load(f)

In order to handle multiple files and file writing we can pass keyword arguments to the decorator in the form parameter=mode. This will open a file specified by the argument value for parameter using mode specified by this keyword argument.

Writing to file example, when the wf argument is a string it will automatically be opened in write mode:

from file_or_name import file_or_name

@file_or_name(wf='w')
def write_json(data, wf):
    json.dumps(data, wf, indent=2)

Reading and writing example, any argument values that are strings for either rf or wf will be opened in read mode and write mode respectivly:

from file_or_name import file_or_name

@file_or_name(rf='r', wf='w')
def convert_jsonl_to_yaml(rf, wf):
    for line in rf:
        wf.write(yaml.dump(json.loads(line)) + "\n")

File or Name lets you, the library developer, write function that operate on files object making code cleaner and more testable while letting your users interact with your code using simple file path string arguments. It also will automatically open pathlib objects as arguments too.

Shadow Paging

I often have code that will read from a file with a generator, this lets me process chunks of data at a time and I don’t have to worry about materializing the whole file in memory. The problem is when I want to read data from a file, make changes to it and then write back to that same file. You can’t open that file for writing because that would destroy the data you are lazily reading from it with the generator. A common solution is to read the data in and keep it in memory, process the data and write it all back. This defeats the purpose of using a generator in the first place, it also means it is possible to have a error when writing data that will leave you in a state were your data disappeared. This is why I introduced the shadow page to this library. Using a NamedTemporaryFile you can write to this file as much as you want and when you close the file it will be automatically used to replace the file on disk in an atomic way, This means you can’t lose you data by having a bug during writing and it lets you write back to a file that you are using a generator to read from.

You can use this functionality by prefixing your write modes with a s

from file_or_name import file_or_name

@file_or_name(f='r', wf='sw')
def reverse(f, wf):
    data = f.read()[::-1]
    if random.random() < 0.5:
        raise ValueError
    wf.write(data)

Without a shadow page when you read in this data and try to write it the possibility of a the ValueError between when the file is opened for writing and when it is actually written could cause you to lose all your data. If the error occurs when using the shadow page your original read data will be left intact and if the error doesn’t happen then the data will be reversed.