Workflow IO
- class workflow_utils.workflow_io.DatasetFetch(*, uri_rest: str, uri_flight: str, dataset_id: str)
- dataset_id: str
- get_flight_ticket()
- uri_flight: str
- uri_rest: str
- class workflow_utils.workflow_io.PackageItemFetch(*, uri_rest: str, uri_flight: str, presigned_id: str, dataset_id: str)
- dataset_id: str
- get_flight_ticket()
- presigned_id: str
- uri_flight: str
- uri_rest: str
- class workflow_utils.workflow_io.PackageOptions(type: dataclasses.InitVar[str], format: dataclasses.InitVar[str | None] = None, content_type: dataclasses.InitVar[str | None] = None, name: dataclasses.InitVar[str | None] = None, prio: dataclasses.InitVar[int | None] = None, query_params: str | None = None, flight_path: list[str] | None = None)
Options for a package item.
- Parameters:
type (str, optional) – “DATASET” or “FILE”
format (str, optional) – “arrow” for datasets
content_type (str, optional) – When uploading files, the Content-Type header is set to this value
name (str, optional) – The name of the uploaded package item
prio (int, optional) – The prio of the uploaded package item, lower numbers are higher priority
- content_type: dataclasses.InitVar[str | None] = None
- flight_path: list[str] | None = None
- format: dataclasses.InitVar[str | None] = None
- name: dataclasses.InitVar[str | None] = None
- prio: dataclasses.InitVar[int | None] = None
- query_params: str | None = None
- type: dataclasses.InitVar[str]
- class workflow_utils.workflow_io.WorkflowIo(config_path: str = '/app/config.json')
- argument(parameter: int) Argument | None
Get the argument of an input parameter.
- Parameters:
parameter (int) – The parameter number of the input (1 or greater)
- Returns:
The argument of the input parameter.
- Return type:
- argument_type(parameter: int) ArgumentType
Get the type of an input parameter.
- Parameters:
parameter (int) – The parameter number of the input (1 or greater)
- Returns:
The type of the input parameter, such as INTEGER, STRING, etc.
- Return type:
- arrow_value(*, parameter_name: str | None = None, parameter: int | None = None) Table | None
Resolve an input as PyArrow Table, either from the parameter or the name of the input, if possible. Return None if it cannot be resolved.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
- Returns:
The value as PyArrow Table or None if not found
- Return type:
Optional[pyarrow.Table]
- dataset_value(*, parameter_name: str | None = None, parameter: int | None = None, converters: dict[str, Any] | None = None, has_header: bool = True) Dataset | None
Resolve an input as a Dataset, either from the parameter or the name of the input. Return None if it cannot be resolved.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
converters (dict[str, Any], optional) – None or an override how a column should be parsed to, for example dict[“My Column”, str]
has_header (bool, optional) – True if the first row is the header
- Returns:
The value as a Dataset or None if not found
- Return type:
Dataset | None
- enum_value(*, parameter_name: str | None = None, parameter: int | None = None) str | None
Get the input enum value as a string, either from the parameter or the name of the input.
- Parameters:
parameter_name (str) – The name of the input parameter. If provided, the value will be retrieved based on the parameter name.
parameter (int) – The parameter number of the input. If provided, the value will be retrieved based on the parameter number.
- Returns:
The enum value as a string, or None if not found.
- Return type:
str or None
- file_content(*, parameter_name: str | None = None, parameter: int | None = None) bytes | None
Get the file content as bytes, either from the parameter or the name of the input.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter of the input (1 or greater)
- Returns:
The file content as bytes or None if not found
- Return type:
Optional[bytes]
- file_like(*, parameter_name: str | None = None, parameter: int | None = None, suffix: str | None = None) str
Get the input as a file-like object and save it to disk.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
suffix (str, optional) – The file suffix to use for the temporary file, defaults to None
- Returns:
The path to the saved file
- Return type:
str
- float_value(*, parameter_name: str | None = None, parameter: int | None = None) float | None
Get the input float value, either from the parameter or the name of the input.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
- Returns:
The value as float or None if not found
- Return type:
float | None
- int_value(*, parameter_name: str | None = None, parameter: int | None = None) int | None
Get the input integer value, either from the parameter or the name of the input.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
- Returns:
The value as integer or None if not found
- Return type:
Optional[int]
- json_value(*, parameter_name: str | None = None, parameter: int | None = None) Any
Resolve the input value as a JSON dictionary, either from the parameter or the name of the input. Return None if it cannot be resolved.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
- Returns:
The value as a JSON dictionary or None if not found
- Return type:
Any
- num_inputs() int
Get the number of input parameters.
- Returns:
The number of input parameters.
- Return type:
int
- num_outputs() int
Get the number of output parameters.
- Returns:
The number of output parameters.
- Return type:
int
- pandas_value(*, parameter_name: str | None = None, parameter: int | None = None, converters: dict[Any, Any] | None = None, has_header: bool = True, index_col: int | None = 0) DataFrame | None
Resolve an input as Pandas DataFrame, either from the parameter or the name of the input regardless if it is a file or a dataset, if possible. Return None if it cannot be resolved.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
converters (dict[Any, Any], optional) – None or an override how a column should be parsed to, for example Dict[“My Column”, str]
has_header (bool, optional) – True if the first row is the header
index_col (int, optional) – Column to use as the row labels of the DataFrame
- Returns:
The value as Pandas DataFrame or None if not found
- Return type:
pd.DataFrame | None
- polars_value(*, parameter_name: str | None = None, parameter: int | None = None, converters: dict[str, Any] | None = None, has_header: bool = True) DataFrame | None
Resolve an input as Polars DataFrame, either from the parameter or the name of the input, regardless if it is a file or a dataset, if possible. Return None if it cannot be resolved.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
converters (dict[str, Any], optional) – None or an override how a column should be parsed to, for example dict[“My Column”, str]
has_header (bool, optional) – True if the first row is the header
- Returns:
The value as Polars DataFrame or None if not found
- Return type:
polars.DataFrame | None
- python_module(*, parameter_name: str | None = None, parameter: int | None = None) ModuleType | None
Resolve the input as a Python module, that can be run.
Example: .. code-block:: python
input = io.python_module(parameter=1) input.hello_test()
will call the function hello_test from the input module.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
- Returns:
The imported Python module
- Return type:
Optional[ModuleType]
- read_vector(file: IO[bytes]) list[float]
Reads a vector of floats from the given file.
- Parameters:
file (IO[bytes]) – The file object to read from.
- Returns:
The vector of floats read from the file.
- Return type:
list[float]
- return_arrow(*, parameter_name: str | None = None, parameter: int | None = None, table: DataFrame | DataFrame | Table | Dataset | None = None, content: DataFrame | DataFrame | Table | Dataset | None = None, package_options: PackageOptions | None = None) None
Writes back an Arrow table to a file and uploads it to the output URL.
- Parameters:
parameter_name (str, optional) – The name of the output parameter. Defaults to None.
parameter (int, optional) – The index of the output parameter. Defaults to None.
table (pd.DataFrame | pa.Table | dataset.Dataset) – Deprecated, use ‘content’
content (pd.DataFrame | pl.DataFrame | pa.Table | dataset.Dataset) – The Arrow table or dataframe to be written to the file.
package_options (PackageOptions, optional) – Parameters for when uploading files to a package
- Returns:
None
- return_bytes(*, parameter_name: str | None = None, parameter: int | None = None, content: bytes | str, package_options: PackageOptions | None = None) None
Write back data bytes.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the return parameter.
parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.
content (bytes | str) – The bytes or string to write back.
package_options (PackageOptions, optional) – Parameters for when uploading files to a package
- Returns:
None
- return_csv(*, parameter_name: str | None = None, parameter: int | None = None, content: DataFrame | DataFrame | Series | Series | Table | Dataset, index_label: str | None = None, package_options: PackageOptions | None = None) None
Write back a DataFrame as a CSV file.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the return parameter.
parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.
content (pl.DataFrame | pd.DataFrame | pl.Series | pd.Series) – A Polars or Pandas DataFrame or Series to write back as a CSV file.
index_label (str, optional) – The label of the index column in the CSV file.
package_options (PackageOptions, optional) – Parameters for when uploading files to a package
- Raises:
ValueError – If the content type is invalid.
- Returns:
None
- return_dataset(*, parameter_name: str | None = None, parameter: int | None = None, content: DataFrame | DataFrame | Table | Dataset, package_options: PackageOptions | None = None) None
Writes the content to a dataset and uploads it to the output URL.
- Parameters:
parameter_name (str, optional) – The name of the output parameter. Defaults to None.
parameter (int, optional) – The index of the output parameter. Defaults to None.
content (pd.DataFrame | pl.DataFrame | pa.Table | dataset.Dataset) – The dataset to be written to the file.
package_options (PackageOptions, optional) – Parameters for when uploading files to a package
- Returns:
None
- return_excel(*, parameter_name: str | None = None, parameter: int | None = None, content: DataFrame | DataFrame | Series | Series | Table | Dataset, index_label: str | None = None, package_options: PackageOptions | None = None) None
Write back a DataFrame as an Excel file.
This method writes the provided DataFrame (content) to an Excel file and returns it as an output. The Excel file is temporarily saved as a named temporary file with the .xlsx extension.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the return parameter.
parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.
content (pl.DataFrame | pd.DataFrame | pl.Series | pd.Series) – A Polars or Pandas DataFrame or Series to write back as an Excel file.
index_label (str, optional) – The label of the index column in the Excel file.
package_options (PackageOptions, optional) – Parameters for when uploading files to a package
- Raises:
ValueError – If the content type is invalid.
- Returns:
This method does not return any value directly. Instead, it saves the Excel file and returns it as an output.
- Return type:
None
- return_file(file_name: str, *, parameter_name: str | None = None, parameter: int | None = None, content_type: str | None = None, package_options: PackageOptions | None = None) None
Write back to a file.
- Parameters:
file_name (str) – The path of the file to return.
parameter_name (str, optional) – Either None if not used or the name of the return parameter.
parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.
content_type (str, optional) – Mime-type of the file, will be added as value for the ‘Content-Type’ header
package_options (PackageOptions, optional) – Parameters for when uploading files to a package
- Returns:
None
- return_stream(content: DataFrame | DataFrame | Series | Series | bytes | list, *, parameter_name: str | None = None, parameter: int | None = None, schema: TableSchema | None = None) None
Write back data to a data stream.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the return parameter.
parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.
content (pd.DataFrame | pl.DataFrame | pl.Series | pd.Series | bytes | list[pd.DataFrame] | list[pl.DataFrame] | list[pl.Series] | list[pd.Series] | list[bytes]) – A Polars or Pandas DataFrame or binary data publish on the queue.
- Returns:
None
- return_type(parameter: int) ReturnType
Get the type of an output parameter.
- Parameters:
parameter (int) – The parameter number of the input (1 or greater)
- Returns:
The type of the output parameter, such as INTEGER, STRING, etc.
- Return type:
- return_vector(*, parameter_name: str | None = None, parameter: int | None = None, content: list[str]) None
Writes the vector to a file and uploads it to the output URL.
- Parameters:
parameter_name (str, optional) – The name of the output parameter. Defaults to None.
parameter (int, optional) – The index of the output parameter. Defaults to None.
content (list[str]) – The vector content to be written to the file.
- Returns:
None
- stream_values(*, parameter_name: str | None = None, parameter: int | None = None, count: int | None = None) list[Message]
Get input messages from a data stream, either from the parameter or the name of the input.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
count (int, optional) – None or an override how many messages to fetch from the stream
- Returns:
The list of Message objects or an empty list if no messages are available
- Return type:
list[Message]
- string_value(*, parameter_name: str | None = None, parameter: int | None = None) str | None
Get the input string value, either from the parameter or the name of the input.
- Parameters:
parameter_name (str) – Either None if not used or the name of the input
parameter (int) – Either None if not used or the parameter number of the input
- Returns:
The value as string value or None if not found
- Return type:
str or None
- usp_file(*, parameter_name: str | None = None, parameter: int | None = None) str
Get the input as an USP file and save it to disk.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter number of the input
- Returns:
The path of the USP file
- Return type:
str
- value(*, parameter_name: str | None = None, parameter: int | None = None) str | None
Get the value as a string, either from the index or the name of the input.
- Parameters:
parameter_name (str, optional) – Either None if not used or the name of the input
parameter (int, optional) – Either None if not used or the parameter of the input (1 or greater)
- Returns:
The value as a string or None if not found
- Return type:
Optional[str]