File Readers#
Pyrokinetics can read many different file types, and often it can do so without the user specifying which kind of file they wish to read:
from pyrokinetics import read_equilibrium
# No need to specify that we're reading a G-EQDSK file!
eq = read_equilibrium("my_eq.geqdsk")
This tutorial provides information on the tools Pyrokinetics uses to achieve this, and how to extend Pyrokinetics to handle new file types.
Overview#
There are typically two main classes involved in the process of reading data from disk:
Readables: These classes contain the data that is read and processed from files on disk.
Readers: These classes read and return ‘readables’, and can also validate file types.
For example, Equilibrium
is a readable,
and it is instantiated and returned by reader classes such as
EquilibriumReaderGEQDSK
. In this tutorial, we
will set up a readable class Foo
, along with a reader classes that reads from
.csv
files.
To begin, we must mark Foo
as being readable. This is achieved as so:
# file: foo.py
import numpy as np
from numpy.typing import ArrayLike
# The base class ReadableFromFile is needed to mark
# a class as a 'readable'
from pyrokinetics.file_utils import ReadableFromFile
class Foo(ReadableFromFile):
# __init__ should take raw data -- not a file path
def __init__(self, x: ArrayLike, y: ArrayLike):
if not np.array_equal(np.shape(x), np.shape(y)):
raise ValueError("x and y must have the same shape")
if not np.ndim(x) == 1:
raise ValueError("x and y must be 1D arrays")
self._x = np.asarray(x)
self._y = np.asarray(y)
In order to make a readable class, we have sub-classed
ReadableFromFile
. This adds the following
classmethods:
from_file
: Used as an alternative constructor. Creates a readable from a file.supported_file_types
: Returns a list of file types thatfrom_file
can read.
Having defined a ‘readable’ class, we can now define an associated reader:
# file: foo_csv_reader.py
from pyrokinetics.file_utils import FileReader
from .foo import Foo
class FooReaderCSV(FileReader, file_type="csv", reads=Foo):
...
Again, we have sub-classed a class from file_utils
. We have also
added two required keyword arguments to the class:
FileReader
defines the abstract methodread_from_file()
and the methodverify_file_type()
. This means that sub-classes must provide a definition ofread_from_file()
, or else Python will throw an error. The former method is used to read/process data from files, while the latter is used to determine whether a file is of the correct type.The keyword arguments are used to ‘register’ the reader class with its associated readable. In this case, it is given the key
"csv"
and is registered toFoo
.
We’ll now demonstrate how we might implement these functions:
from pathlib import Path
import pandas as pd
class FooReaderCSV(FileReader, file_type="csv", reads=Foo):
# read_from_file should take a file path as a positional argument,
# and any number of keyword arguments. Keyword arguments can be
# passed on to this function via the 'from_file' method of Foo.
def read_from_file(self, path: Path, y_col: str = "y") -> Foo:
# Use pandas to read a csv and extract two columns
df = pd.read_csv(path)
return Foo(df["x"], df[y_col])
# verify_file_type should check that the file provided is of the
# correct type. This may include making sure that the file contains
# any essential data. If the file is of the wrong type, an Exception
# should be raised. Otherwise, the function should end normally.
def verify_file_type(self, path: Path) -> None:
# Use pandas to read csv, but without loading all rows.
# It will throw an exception if the file can't be found,
# or if it isn't readable as a csv file.
df = pd.read_csv(path, nrows=1)
# Also check that any required data is present. In this
# case, we only need to check for the presence of the
# column 'x'
if not "x" in df:
raise RuntimeError("Foo csv needs an 'x' column")
# If we get here, it's probably a Foo csv. Exit normally
# without returning.
pass
Real read_from_file
methods are likely to be much more complicated, and will likely
require further data processing. They may also require adding units to the readable’s
input data. A good verify_file_type
function should be very fast to run, and should
load/process the minimum amount of data in order to ensure the file is of the correct
type.
With these functions defined, and reader classes registered, we can now use the
classmethods supported_file_types
and from_file
:
>>> foo = Foo.from_file("my_foo.csv", file_type="csv")
>>> foo = Foo.from_file("my_foo.csv") # file_type isn't needed!
>>> print(Foo.supported_file_types())
["csv"]
We’ll explain in the next section why the file_type
argument isn’t strictly needed.
Internal Details#
So how do the tools discussed in the previous section work to allow us to determine
a file type automatically and read a file via a single call to Readable.from_file
?
Internally, this is managed using a specialised ‘factory’ class.
A factory is a function/class that allows users to create objects without specifying their exact types. They provide a common interface to the constructors of a collection of related types. The way they typically work is as follows:
A collection of related classes are defined:
A1
,A2
, andA3
. These may be related via a (possibly abstract) super classA
, or they may be related simply by ‘duck typing’, i.e. they all have similar constructor/function signatures.Each class we wish the factory to produce is assigned a ‘key’ by which they may be referenced:
"A1"
,"A2"
,"A3"
. These classes are registered with the factory, e.g.my_factory.register("A1", A1)
.The factory can then be used to create new instances of each class by providing the registered key.
my_factory.create("A1", *args, **kwargs)
may be used as an alternative toA1(*args, **kwargs)
.
Some of the benefit of using factories over using classes directly are:
The user doesn’t need to know exact class names, and doesn’t need to import each class they might want to build independently – they only need to import the factory.
We avoid long
if..elif...else
chains such as the following:
if condition_for_A1:
return A1(*args, **kwargs)
elif condition_for_A2:
return A2(*args, **kwargs)
elif condition_for_A3:
return A3(*args, **kwargs)
else:
...
The factory can create objects based on other conditions instead of simply looking up a registered key, so in cases where it isn’t clear which type the user might want to return, the factory can figure this out and return a suitable class for them.
The factories used to link readers and readables don’t need to be imported directly, as
they are stored as class-level attributes on each readable. The special method
__init_subclass__
on ReadableFromFile
is responsible for setting this up
for each readable. Users don’t need to interact with these factories directly, as
FileReader
also makes use of __init_subclass__
to
handle registration at the point that the class is defined.
The from_file(path, file_type)
method handles the object creation process. For
readers and readables, this is a two step process:
Use a factory to create the correct type of reader. This is determined by the optional
file_type
argument.Call that reader’s
read_from_file
function using the providedpath
.
The additional bit of magic in Pyrokinetics is provided by the verify_file_type
functions defined by each reader class. If the user doesn’t pass file_type
to
from_file
, the internal factory instead searches through each registered reader
class and calls verify_file_type
for each reader in turn. If, for some reader, this
function exits normally without raising an exception, that reader it is then used to
read the provided file. This can take a long time if verify_file_type
functions are
slow to execute, so it is best for these functions to be very short and not to perform
any unnecessary additional processing.
GKInput
: Both Reader and Readable#
GKInput
fits strangely into this scheme, as
while GKInput
itself is a ‘readable’, it’s
‘readers’ are its own subclasses. This is because the reader classes fill in their
attributes as a side effect of calling read_from_file
. These readers should usually
be retained after use, as they provide further functionality besides that offered by
read_from_file
. The way these readers are handled within Pyrokinetics differs
compared to other reader/readable pairs, as Pyro
makes
direct use of the private factory object within
GKInput
to manage them.
This implementation may change in a later release.
Caution
The subclasses of GKInput
do not return self
from read_from_file
,
but rather a dict-like object containing the raw data from the file they read.
Remember to keep the reader class around if you want to call any other functions!
Adding Plugins to Pyrokinetics#
If you write your own file reader and wish to use it alongside those bundled with Pyrokinetics, there are two ways to achieve this. The first method is to ensure your file reader is imported somewhere within your Python session, even if it is never used directly:
from my_project.my_module import MyEqReader # This is not used directly!
# If MyEqReader reads Equilibrium and has file type "MyFileType":
eq = pyrokinetics.read_equilibrium(filename, "MyFileType")
Provided MyFileReader
subclasses FileReader
and
provides the keyword arguments file_type
and reads
, it will be registered
alongside the Pyrokinetics classes.
If you’re developing a packaged Python project, a cleaner way to bundle your own classes
with Pyrokinetics is to assign them using entry points in your pyproject.toml
file:
[project.entry-points."pyrokinetics.equilibrium"]
MyFileType = "my_project.my_module:MyEqReader"
Pyrokinetics makes use of a plugin system that will automatically register classes in
your Python environment registered this way. Note that here,
"pyrokinetics.equilibrium"
is an entry point group name, not a module. The group
names for each Pyrokinetics file reader can be found at pyrokinetics.plugins
.
When adding your own Pyrokinetics plugin classes, we recommend not importing them
within your own project __init__.py
files, and instead accessing them via the
Pyrokinetics interface, as otherwise you may run into circular import problems.
For examples of how to implement your own plugins, please see the Python package pyrokinetics-plugin-examples, which is used to test the plugin systems in Pyrokinetics. Although this package does not implement useful plugins, it does demonstrate the necessary class signatures and which functions need to be implemented.
For more information, please see: