aiida.repository
#
Module with resources dealing with the file repository.
Package Contents#
Classes#
Class that defines the abstract interface for an object repository. |
|
Implementation of the |
|
Data class representing a file object. |
|
Enumeration to represent the type of a file object. |
|
File repository. |
|
Implementation of the |
API#
- class aiida.repository.AbstractRepositoryBackend#
Class that defines the abstract interface for an object repository.
The repository backend only deals with raw bytes, both when creating new objects as well as when returning a stream or the content of an existing object. The encoding and decoding of the byte content should be done by the client upstream. The file repository backend is also not expected to keep any kind of file hierarchy but must be assumed to be a simple flat data store. When files are created in the file object repository, the implementation will return a string-based key with which the content of the stored object can be addressed. This key is guaranteed to be unique and persistent. Persisting the key or mapping it onto a virtual file hierarchy is again up to the client upstream.
- abstract property key_format: Optional[str]#
Return the format for the keys of the repository.
Important for when migrating between backends (e.g. archive -> main), as if they are not equal then it is necessary to re-compute all the Node.base.repository.metadata before importing (otherwise they will not match with the repository).
- abstract initialise(**kwargs) None #
Initialise the repository if it hasn’t already been initialised.
- Parameters:
kwargs – parameters for the initialisation.
- abstract erase() None #
Delete the repository itself and all its contents.
Note
This should not merely delete the contents of the repository but any resources it created. For example, if the repository is essentially a folder on disk, the folder itself should also be deleted, not just its contents.
- put_object_from_filelike(handle: BinaryIO) str #
Store the byte contents of a file in the repository.
- Parameters:
handle – filelike object with the byte content to be stored.
- Returns:
the generated fully qualified identifier for the object within the repository.
- Raises:
TypeError – if the handle is not a byte stream.
- put_object_from_file(filepath: Union[str, pathlib.Path]) str #
Store a new object with contents of the file located at filepath on this file system.
- Parameters:
filepath – absolute path of file whose contents to copy to the repository.
- Returns:
the generated fully qualified identifier for the object within the repository.
- Raises:
TypeError – if the handle is not a byte stream.
- abstract has_objects(keys: List[str]) List[bool] #
Return whether the repository has an object with the given key.
- Parameters:
keys – list of fully qualified identifiers for objects within the repository.
- Returns:
list of logicals, in the same order as the keys provided, with value True if the respective object exists and False otherwise.
- has_object(key: str) bool #
Return whether the repository has an object with the given key.
- Parameters:
key – fully qualified identifier for the object within the repository.
- Returns:
True if the object exists, False otherwise.
- abstract list_objects() Iterable[str] #
Return iterable that yields all available objects by key.
- Returns:
An iterable for all the available object keys.
- abstract get_info(detailed: bool = False, **kwargs) dict #
Returns relevant information about the content of the repository.
- Parameters:
detailed – flag to enable extra information (detailed=False by default, only returns basic information).
- Returns:
a dictionary with the information.
- abstract maintain(dry_run: bool = False, live: bool = True, **kwargs) None #
Performs maintenance operations.
- Parameters:
dry_run – flag to only print the actions that would be taken without actually executing them.
live – flag to indicate to the backend whether AiiDA is live or not (i.e. if the profile of the backend is currently being used/accessed). The backend is expected then to only allow (and thus set by default) the operations that are safe to perform in this state.
- open(key: str) Iterator[BinaryIO] #
Open a file handle to an object stored under the given key.
Note
this should only be used to open a handle to read an existing file. To write a new file use the method
put_object_from_filelike
instead.- Parameters:
key – fully qualified identifier for the object within the repository.
- Returns:
yield a byte stream object.
- Raises:
FileNotFoundError – if the file does not exist.
OSError – if the file could not be opened.
- get_object_content(key: str) bytes #
Return the content of a object identified by key.
- Parameters:
key – fully qualified identifier for the object within the repository.
- Raises:
FileNotFoundError – if the file does not exist.
OSError – if the file could not be opened.
- abstract iter_object_streams(keys: List[str]) Iterator[Tuple[str, BinaryIO]] #
Return an iterator over the (read-only) byte streams of objects identified by key.
Note
handles should only be read within the context of this iterator.
- Parameters:
keys – fully qualified identifiers for the objects within the repository.
- Returns:
an iterator over the object byte streams.
- Raises:
FileNotFoundError – if the file does not exist.
OSError – if a file could not be opened.
- get_object_hash(key: str) str #
Return the SHA-256 hash of an object stored under the given key.
Important
A SHA-256 hash should always be returned, to ensure consistency across different repository implementations.
- Parameters:
key – fully qualified identifier for the object within the repository.
- Raises:
FileNotFoundError – if the file does not exist.
OSError – if the file could not be opened.
- abstract delete_objects(keys: List[str]) None #
Delete the objects from the repository.
- Parameters:
keys – list of fully qualified identifiers for the objects within the repository.
- Raises:
FileNotFoundError – if any of the files does not exist.
OSError – if any of the files could not be deleted.
- delete_object(key: str) None #
Delete the object from the repository.
- Parameters:
key – fully qualified identifier for the object within the repository.
- Raises:
FileNotFoundError – if the file does not exist.
OSError – if the file could not be deleted.
- class aiida.repository.DiskObjectStoreRepositoryBackend(container: disk_objectstore.Container)#
Bases:
aiida.repository.backend.abstract.AbstractRepositoryBackend
Implementation of the
AbstractRepositoryBackend
using thedisk-object-store
as the backend.Note
For certain methods, the container may create a sessions which should be closed after the operation is done to make sure the connection to the underlying sqlite database is closed. The best way is to accomplish this is by using the container as a context manager, which will automatically call the
close
method when it exits which ensures the session being closed. Note that not all methods may open the session and so need closing it, but to be on the safe side, we put every use of the container in a context manager. If no session is created, theclose
method is essentially a no-op.Initialization
- initialise(**kwargs) None #
Initialise the repository if it hasn’t already been initialised.
- Parameters:
kwargs – parameters for the initialisation.
- erase()#
Delete the repository itself and all its contents.
- _put_object_from_filelike(handle: BinaryIO) str #
Store the byte contents of a file in the repository.
- Parameters:
handle – filelike object with the byte content to be stored.
- Returns:
the generated fully qualified identifier for the object within the repository.
- Raises:
TypeError – if the handle is not a byte stream.
- open(key: str) Iterator[BinaryIO] #
Open a file handle to an object stored under the given key.
Note
this should only be used to open a handle to read an existing file. To write a new file use the method
put_object_from_filelike
instead.- Parameters:
key – fully qualified identifier for the object within the repository.
- Returns:
yield a byte stream object.
- Raises:
FileNotFoundError – if the file does not exist.
OSError – if the file could not be opened.
- get_object_hash(key: str) str #
Return the SHA-256 hash of an object stored under the given key.
Important
A SHA-256 hash should always be returned, to ensure consistency across different repository implementations.
- Parameters:
key – fully qualified identifier for the object within the repository.
- Raises:
FileNotFoundError – if the file does not exist.
- maintain(dry_run: bool = False, live: bool = True, pack_loose: bool = None, do_repack: bool = None, clean_storage: bool = None, do_vacuum: bool = None) dict #
Performs maintenance operations.
:param live:if True, will only perform operations that are safe to do while the repository is in use. :param pack_loose:flag for forcing the packing of loose files. :param do_repack:flag for forcing the re-packing of already packed files. :param clean_storage:flag for forcing the cleaning of soft-deleted files from the repository. :param do_vacuum:flag for forcing the vacuuming of the internal database when cleaning the repository. :return:a dictionary with information on the operations performed.
- class aiida.repository.File(name: str = '', file_type: aiida.repository.common.FileType = FileType.DIRECTORY, key: Union[str, None] = None, objects: Optional[Dict[str, aiida.repository.common.File]] = None)#
Data class representing a file object.
Initialization
Construct a new instance.
- Parameters:
name – The final element of the file path
file_type – Identifies whether the File is a file or a directory
key – A key to map the file to its contents in the backend repository (file only)
objects – Mapping of child names to child Files (directory only)
- Raises:
ValueError – If a key is defined for a directory, or objects are defined for a file
- classmethod from_serialized(serialized: dict, name='') aiida.repository.common.File #
Construct a new instance from a serialized instance.
- Parameters:
serialized – the serialized instance.
- Returns:
the reconstructed file object.
- serialize() dict #
Serialize the metadata into a JSON-serializable format.
Note
the serialization format is optimized to reduce the size in bytes.
- Returns:
dictionary with the content metadata.
- property file_type: aiida.repository.common.FileType#
Return the file type of the file object.
- property objects: Dict[str, aiida.repository.common.File]#
Return the objects of the file object.
- __repr__()#
- class aiida.repository.FileType#
Bases:
enum.Enum
Enumeration to represent the type of a file object.
- DIRECTORY = 0#
- FILE = 1#
- class aiida.repository.Repository(backend: Optional[aiida.repository.backend.AbstractRepositoryBackend] = None)#
File repository.
This class provides an interface to a backend file repository instance, but unlike the backend repository, this class keeps a reference of the virtual file hierarchy. This means that through this interface, a client can create files and directories with a file hierarchy, just as they would on a local file system, except it is completely virtual as the files are stored by the backend which can store them in a completely flat structure. This also means that the internal virtual hierarchy of a
Repository
instance does not necessarily represent all the files that are stored by repository backend. The repository exposes a mere subset of all the file objects stored in the backend. This is why object deletion is also implemented as a soft delete, by default, where the files are just removed from the internal virtual hierarchy, but not in the actual backend. This is because those objects can be referenced by other instances.Initialization
Construct a new instance with empty metadata.
- Parameters:
backend – instance of repository backend to use to actually store the file objects. By default, an instance of the
SandboxRepositoryBackend
will be created.
- _file_cls = None#
- property uuid: Optional[str]#
Return the unique identifier of the repository backend or
None
if it doesn’t have one.
- classmethod from_serialized(backend: aiida.repository.backend.AbstractRepositoryBackend, serialized: Dict[str, Any]) aiida.repository.repository.Repository #
Construct an instance where the metadata is initialized from the serialized content.
- Parameters:
backend – instance of repository backend to use to actually store the file objects.
- serialize() Dict[str, Any] #
Serialize the metadata into a JSON-serializable format.
- Returns:
dictionary with the content metadata.
- classmethod flatten(serialized=Optional[Dict[str, Any]], delimiter: str = '/') Dict[str, Optional[str]] #
Flatten the serialized content of a repository into a mapping of path -> key or None (if folder).
Note, all folders are represented in the flattened output, and their path is suffixed with the delimiter.
- Parameters:
serialized – the serialized content of the repository.
delimiter – the delimiter to use to separate the path elements.
- Returns:
dictionary with the flattened content.
- hash() str #
Generate a hash of the repository’s contents.
Warning
this will read the content of all file objects contained within the virtual hierarchy into memory.
- Returns:
the hash representing the contents of the repository.
- static _pre_process_path(path: Optional[aiida.repository.repository.FilePath] = None) pathlib.PurePosixPath #
Validate and convert the path to instance of
pathlib.PurePosixPath
.This should be called by every method of this class before doing anything, such that it can safely assume that the path is a
pathlib.PurePosixPath
object, which makes path manipulation a lot easier.- Parameters:
path – the path as a
pathlib.PurePosixPath
object or None.- Raises:
TypeError – if the type of path was not a str nor a
pathlib.PurePosixPath
instance.
- property backend: aiida.repository.backend.AbstractRepositoryBackend#
Return the current repository backend.
- Returns:
the repository backend.
- set_backend(backend: aiida.repository.backend.AbstractRepositoryBackend) None #
Set the backend for this repository.
- Parameters:
backend – the repository backend.
- Raises:
TypeError – if the type of the backend is invalid.
- _insert_file(path: pathlib.PurePosixPath, key: str) None #
Insert a new file object in the object mapping.
Note
this assumes the path is a valid relative path, so should be checked by the caller.
- Parameters:
path – the relative path where to store the object in the repository.
key – fully qualified identifier for the object within the repository.
- create_directory(path: aiida.repository.repository.FilePath) aiida.repository.common.File #
Create a new directory with the given path.
- Parameters:
path – the relative path of the directory.
- Returns:
the created directory.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.
- get_file_keys() List[str] #
Return the keys of all file objects contained within this repository.
- Returns:
list of keys, which map a file to its content in the backend repository.
- get_object(path: Optional[aiida.repository.repository.FilePath] = None) aiida.repository.common.File #
Return the object at the given path.
- Parameters:
path – the relative path where to store the object in the repository.
- Returns:
the File representing the object located at the given relative path.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if no object exists for the given path.
- get_directory(path: Optional[aiida.repository.repository.FilePath] = None) aiida.repository.common.File #
Return the directory object at the given path.
- Parameters:
path – the relative path of the directory.
- Returns:
the File representing the object located at the given relative path.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if no object exists for the given path.
NotADirectoryError – if the object at the given path is not a directory.
- get_file(path: aiida.repository.repository.FilePath) aiida.repository.common.File #
Return the file object at the given path.
- Parameters:
path – the relative path of the file object.
- Returns:
the File representing the object located at the given relative path.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if no object exists for the given path.
IsADirectoryError – if the object at the given path is not a directory.
- list_objects(path: Optional[aiida.repository.repository.FilePath] = None) List[aiida.repository.common.File] #
Return a list of the objects contained in this repository sorted by name, optionally in given sub directory.
- Parameters:
path – the relative path of the directory.
- Returns:
a list of File named tuples representing the objects present in directory with the given path.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if no object exists for the given path.
NotADirectoryError – if the object at the given path is not a directory.
- list_object_names(path: Optional[aiida.repository.repository.FilePath] = None) List[str] #
Return a sorted list of the object names contained in this repository, optionally in the given sub directory.
- Parameters:
path – the relative path of the directory.
- Returns:
a list of File named tuples representing the objects present in directory with the given path.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if no object exists for the given path.
NotADirectoryError – if the object at the given path is not a directory.
- put_object_from_filelike(handle: BinaryIO, path: aiida.repository.repository.FilePath) None #
Store the byte contents of a file in the repository.
- Parameters:
handle – filelike object with the byte content to be stored.
path – the relative path where to store the object in the repository.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.
- put_object_from_file(filepath: aiida.repository.repository.FilePath, path: aiida.repository.repository.FilePath) None #
Store a new object under path with contents of the file located at filepath on the local file system.
- Parameters:
filepath – absolute path of file whose contents to copy to the repository
path – the relative path where to store the object in the repository.
- Raises:
TypeError – if the path is not a string and relative path, or the handle is not a byte stream.
- put_object_from_tree(filepath: aiida.repository.repository.FilePath, path: Optional[aiida.repository.repository.FilePath] = None) None #
Store the entire contents of filepath on the local file system in the repository with under given path.
- Parameters:
filepath – absolute path of the directory whose contents to copy to the repository.
path – the relative path where to store the objects in the repository.
- Raises:
- is_empty() bool #
Return whether the repository is empty.
- Returns:
True if the repository contains no file objects.
- has_object(path: aiida.repository.repository.FilePath) bool #
Return whether the repository has an object with the given path.
- Parameters:
path – the relative path of the object within the repository.
- Returns:
True if the object exists, False otherwise.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.
- open(path: aiida.repository.repository.FilePath) Iterator[BinaryIO] #
Open a file handle to an object stored under the given path.
Note
this should only be used to open a handle to read an existing file. To write a new file use the method
put_object_from_filelike
instead.- Parameters:
path – the relative path of the object within the repository.
- Returns:
yield a byte stream object.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if the file does not exist.
IsADirectoryError – if the object is a directory and not a file.
OSError – if the file could not be opened.
- get_object_content(path: aiida.repository.repository.FilePath) bytes #
Return the content of a object identified by path.
- Parameters:
path – the relative path of the object within the repository.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if the file does not exist.
IsADirectoryError – if the object is a directory and not a file.
OSError – if the file could not be opened.
- delete_object(path: aiida.repository.repository.FilePath, hard_delete: bool = False) None #
Soft delete the object from the repository.
Note
can only delete file objects, but not directories.
- Parameters:
path – the relative path of the object within the repository.
hard_delete – when true, not only remove the file from the internal mapping but also call through to the
delete_object
method of the actual repository backend.
- Raises:
TypeError – if the path is not a string or
Path
, or is an absolute path.FileNotFoundError – if the file does not exist.
IsADirectoryError – if the object is a directory and not a file.
OSError – if the file could not be deleted.
- clone(source: aiida.repository.repository.Repository) None #
Clone the contents of another repository instance.
- walk(path: Optional[aiida.repository.repository.FilePath] = None) Iterable[Tuple[pathlib.PurePosixPath, List[str], List[str]]] #
Walk over the directories and files contained within this repository.
Note
the order of the dirname and filename lists that are returned is not necessarily sorted. This is in line with the
os.walk
implementation where the order depends on the underlying file system used.- Parameters:
path – the relative path of the directory within the repository whose contents to walk.
- Returns:
tuples of root, dirnames and filenames just like
os.walk
, with the exception that the root path is always relative with respect to the repository root, instead of an absolute path and it is an instance ofpathlib.PurePosixPath
instead of a normal string
- copy_tree(target: Union[str, pathlib.Path], path: Optional[aiida.repository.repository.FilePath] = None) None #
Copy the contents of the entire node repository to another location on the local file system.
Note
If
path
is specified, only its contents are copied, and the relative path with respect to the root is discarded. For example, ifpath
isrelative/sub
, the contents ofsub
will be copied directly to the target, without therelative/sub
directory.- Parameters:
target – absolute path of the directory where to copy the contents to.
path – optional relative path whose contents to copy.
- Raises:
TypeError – if
target
is of incorrect type or not absolute.NotADirectoryError – if
path
does not reference a directory.
- class aiida.repository.SandboxRepositoryBackend(filepath: str | None = None)#
Bases:
aiida.repository.backend.abstract.AbstractRepositoryBackend
Implementation of the
AbstractRepositoryBackend
using a sandbox folder on disk as the backend.Initialization
Construct a new instance.
- Parameters:
filepath – The path to the directory in which the sandbox folder should be created.
- __del__()#
Delete the entire sandbox folder if it was instantiated and still exists.
- property uuid: str | None#
Return the unique identifier of the repository.
Note
A sandbox folder does not have the concept of a unique identifier and so always returns
None
.
- initialise(**kwargs) None #
Initialise the repository if it hasn’t already been initialised.
- Parameters:
kwargs – parameters for the initialisation.
- property sandbox#
Return the sandbox instance of this repository.
- erase()#
Delete the repository itself and all its contents.
- _put_object_from_filelike(handle: BinaryIO) str #
Store the byte contents of a file in the repository.
- Parameters:
handle – filelike object with the byte content to be stored.
- Returns:
the generated fully qualified identifier for the object within the repository.
- Raises:
TypeError – if the handle is not a byte stream.