Workflow IO

class workflow_utils.workflow_io.DatasetFetch(*, uri_rest: str, uri_flight: str, dataset_id: str)
dataset_id: str
get_flight_ticket()
uri_flight: str
uri_rest: str
class workflow_utils.workflow_io.PackageItemFetch(*, uri_rest: str, uri_flight: str, presigned_id: str, dataset_id: str)
dataset_id: str
get_flight_ticket()
presigned_id: str
uri_flight: str
uri_rest: str
class workflow_utils.workflow_io.PackageOptions(type: dataclasses.InitVar[str], format: dataclasses.InitVar[str | None] = None, content_type: dataclasses.InitVar[str | None] = None, name: dataclasses.InitVar[str | None] = None, prio: dataclasses.InitVar[int | None] = None, query_params: str | None = None, flight_path: list[str] | None = None)

Options for a package item.

Parameters:
  • type (str, optional) – “DATASET” or “FILE”

  • format (str, optional) – “arrow” for datasets

  • content_type (str, optional) – When uploading files, the Content-Type header is set to this value

  • name (str, optional) – The name of the uploaded package item

  • prio (int, optional) – The prio of the uploaded package item, lower numbers are higher priority

content_type: dataclasses.InitVar[str | None] = None
flight_path: list[str] | None = None
format: dataclasses.InitVar[str | None] = None
name: dataclasses.InitVar[str | None] = None
prio: dataclasses.InitVar[int | None] = None
query_params: str | None = None
type: dataclasses.InitVar[str]
class workflow_utils.workflow_io.WorkflowIo(config_path: str = '/app/config.json')
argument(parameter: int) Argument | None

Get the argument of an input parameter.

Parameters:

parameter (int) – The parameter number of the input (1 or greater)

Returns:

The argument of the input parameter.

Return type:

Argument

argument_type(parameter: int) ArgumentType

Get the type of an input parameter.

Parameters:

parameter (int) – The parameter number of the input (1 or greater)

Returns:

The type of the input parameter, such as INTEGER, STRING, etc.

Return type:

ArgumentType

arrow_value(*, parameter_name: str | None = None, parameter: int | None = None) Table | None

Resolve an input as PyArrow Table, either from the parameter or the name of the input, if possible. Return None if it cannot be resolved.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

Returns:

The value as PyArrow Table or None if not found

Return type:

Optional[pyarrow.Table]

dataset_value(*, parameter_name: str | None = None, parameter: int | None = None, converters: dict[str, Any] | None = None, has_header: bool = True) Dataset | None

Resolve an input as a Dataset, either from the parameter or the name of the input. Return None if it cannot be resolved.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

  • converters (dict[str, Any], optional) – None or an override how a column should be parsed to, for example dict[“My Column”, str]

  • has_header (bool, optional) – True if the first row is the header

Returns:

The value as a Dataset or None if not found

Return type:

Dataset | None

enum_value(*, parameter_name: str | None = None, parameter: int | None = None) str | None

Get the input enum value as a string, either from the parameter or the name of the input.

Parameters:
  • parameter_name (str) – The name of the input parameter. If provided, the value will be retrieved based on the parameter name.

  • parameter (int) – The parameter number of the input. If provided, the value will be retrieved based on the parameter number.

Returns:

The enum value as a string, or None if not found.

Return type:

str or None

file_content(*, parameter_name: str | None = None, parameter: int | None = None) bytes | None

Get the file content as bytes, either from the parameter or the name of the input.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter of the input (1 or greater)

Returns:

The file content as bytes or None if not found

Return type:

Optional[bytes]

file_like(*, parameter_name: str | None = None, parameter: int | None = None, suffix: str | None = None) str

Get the input as a file-like object and save it to disk.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

  • suffix (str, optional) – The file suffix to use for the temporary file, defaults to None

Returns:

The path to the saved file

Return type:

str

float_value(*, parameter_name: str | None = None, parameter: int | None = None) float | None

Get the input float value, either from the parameter or the name of the input.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

Returns:

The value as float or None if not found

Return type:

float | None

int_value(*, parameter_name: str | None = None, parameter: int | None = None) int | None

Get the input integer value, either from the parameter or the name of the input.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

Returns:

The value as integer or None if not found

Return type:

Optional[int]

json_value(*, parameter_name: str | None = None, parameter: int | None = None) Any

Resolve the input value as a JSON dictionary, either from the parameter or the name of the input. Return None if it cannot be resolved.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

Returns:

The value as a JSON dictionary or None if not found

Return type:

Any

num_inputs() int

Get the number of input parameters.

Returns:

The number of input parameters.

Return type:

int

num_outputs() int

Get the number of output parameters.

Returns:

The number of output parameters.

Return type:

int

pandas_value(*, parameter_name: str | None = None, parameter: int | None = None, converters: dict[Any, Any] | None = None, has_header: bool = True, index_col: int | None = 0) DataFrame | None

Resolve an input as Pandas DataFrame, either from the parameter or the name of the input regardless if it is a file or a dataset, if possible. Return None if it cannot be resolved.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

  • converters (dict[Any, Any], optional) – None or an override how a column should be parsed to, for example Dict[“My Column”, str]

  • has_header (bool, optional) – True if the first row is the header

  • index_col (int, optional) – Column to use as the row labels of the DataFrame

Returns:

The value as Pandas DataFrame or None if not found

Return type:

pd.DataFrame | None

polars_value(*, parameter_name: str | None = None, parameter: int | None = None, converters: dict[str, Any] | None = None, has_header: bool = True) DataFrame | None

Resolve an input as Polars DataFrame, either from the parameter or the name of the input, regardless if it is a file or a dataset, if possible. Return None if it cannot be resolved.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

  • converters (dict[str, Any], optional) – None or an override how a column should be parsed to, for example dict[“My Column”, str]

  • has_header (bool, optional) – True if the first row is the header

Returns:

The value as Polars DataFrame or None if not found

Return type:

polars.DataFrame | None

python_module(*, parameter_name: str | None = None, parameter: int | None = None) ModuleType | None

Resolve the input as a Python module, that can be run.

Example: .. code-block:: python

input = io.python_module(parameter=1) input.hello_test()

will call the function hello_test from the input module.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

Returns:

The imported Python module

Return type:

Optional[ModuleType]

read_vector(file: IO[bytes]) list[float]

Reads a vector of floats from the given file.

Parameters:

file (IO[bytes]) – The file object to read from.

Returns:

The vector of floats read from the file.

Return type:

list[float]

return_arrow(*, parameter_name: str | None = None, parameter: int | None = None, table: DataFrame | DataFrame | Table | Dataset | None = None, content: DataFrame | DataFrame | Table | Dataset | None = None, package_options: PackageOptions | None = None) None

Writes back an Arrow table to a file and uploads it to the output URL.

Parameters:
  • parameter_name (str, optional) – The name of the output parameter. Defaults to None.

  • parameter (int, optional) – The index of the output parameter. Defaults to None.

  • table (pd.DataFrame | pa.Table | dataset.Dataset) – Deprecated, use ‘content’

  • content (pd.DataFrame | pl.DataFrame | pa.Table | dataset.Dataset) – The Arrow table or dataframe to be written to the file.

  • package_options (PackageOptions, optional) – Parameters for when uploading files to a package

Returns:

None

return_bytes(*, parameter_name: str | None = None, parameter: int | None = None, content: bytes | str, package_options: PackageOptions | None = None) None

Write back data bytes.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the return parameter.

  • parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.

  • content (bytes | str) – The bytes or string to write back.

  • package_options (PackageOptions, optional) – Parameters for when uploading files to a package

Returns:

None

return_csv(*, parameter_name: str | None = None, parameter: int | None = None, content: DataFrame | DataFrame | Series | Series | Table | Dataset, index_label: str | None = None, package_options: PackageOptions | None = None) None

Write back a DataFrame as a CSV file.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the return parameter.

  • parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.

  • content (pl.DataFrame | pd.DataFrame | pl.Series | pd.Series) – A Polars or Pandas DataFrame or Series to write back as a CSV file.

  • index_label (str, optional) – The label of the index column in the CSV file.

  • package_options (PackageOptions, optional) – Parameters for when uploading files to a package

Raises:

ValueError – If the content type is invalid.

Returns:

None

return_dataset(*, parameter_name: str | None = None, parameter: int | None = None, content: DataFrame | DataFrame | Table | Dataset, package_options: PackageOptions | None = None) None

Writes the content to a dataset and uploads it to the output URL.

Parameters:
  • parameter_name (str, optional) – The name of the output parameter. Defaults to None.

  • parameter (int, optional) – The index of the output parameter. Defaults to None.

  • content (pd.DataFrame | pl.DataFrame | pa.Table | dataset.Dataset) – The dataset to be written to the file.

  • package_options (PackageOptions, optional) – Parameters for when uploading files to a package

Returns:

None

return_excel(*, parameter_name: str | None = None, parameter: int | None = None, content: DataFrame | DataFrame | Series | Series | Table | Dataset, index_label: str | None = None, package_options: PackageOptions | None = None) None

Write back a DataFrame as an Excel file.

This method writes the provided DataFrame (content) to an Excel file and returns it as an output. The Excel file is temporarily saved as a named temporary file with the .xlsx extension.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the return parameter.

  • parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.

  • content (pl.DataFrame | pd.DataFrame | pl.Series | pd.Series) – A Polars or Pandas DataFrame or Series to write back as an Excel file.

  • index_label (str, optional) – The label of the index column in the Excel file.

  • package_options (PackageOptions, optional) – Parameters for when uploading files to a package

Raises:

ValueError – If the content type is invalid.

Returns:

This method does not return any value directly. Instead, it saves the Excel file and returns it as an output.

Return type:

None

return_file(file_name: str, *, parameter_name: str | None = None, parameter: int | None = None, content_type: str | None = None, package_options: PackageOptions | None = None) None

Write back to a file.

Parameters:
  • file_name (str) – The path of the file to return.

  • parameter_name (str, optional) – Either None if not used or the name of the return parameter.

  • parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.

  • content_type (str, optional) – Mime-type of the file, will be added as value for the ‘Content-Type’ header

  • package_options (PackageOptions, optional) – Parameters for when uploading files to a package

Returns:

None

return_stream(content: DataFrame | DataFrame | Series | Series | bytes | list, *, parameter_name: str | None = None, parameter: int | None = None, schema: TableSchema | None = None) None

Write back data to a data stream.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the return parameter.

  • parameter (int, optional) – Either None if not used or the parameter number to return, 1 or greater.

  • content (pd.DataFrame | pl.DataFrame | pl.Series | pd.Series | bytes | list[pd.DataFrame] | list[pl.DataFrame] | list[pl.Series] | list[pd.Series] | list[bytes]) – A Polars or Pandas DataFrame or binary data publish on the queue.

Returns:

None

return_type(parameter: int) ReturnType

Get the type of an output parameter.

Parameters:

parameter (int) – The parameter number of the input (1 or greater)

Returns:

The type of the output parameter, such as INTEGER, STRING, etc.

Return type:

ReturnType

return_vector(*, parameter_name: str | None = None, parameter: int | None = None, content: list[str]) None

Writes the vector to a file and uploads it to the output URL.

Parameters:
  • parameter_name (str, optional) – The name of the output parameter. Defaults to None.

  • parameter (int, optional) – The index of the output parameter. Defaults to None.

  • content (list[str]) – The vector content to be written to the file.

Returns:

None

stream_values(*, parameter_name: str | None = None, parameter: int | None = None, count: int | None = None) list[Message]

Get input messages from a data stream, either from the parameter or the name of the input.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

  • count (int, optional) – None or an override how many messages to fetch from the stream

Returns:

The list of Message objects or an empty list if no messages are available

Return type:

list[Message]

string_value(*, parameter_name: str | None = None, parameter: int | None = None) str | None

Get the input string value, either from the parameter or the name of the input.

Parameters:
  • parameter_name (str) – Either None if not used or the name of the input

  • parameter (int) – Either None if not used or the parameter number of the input

Returns:

The value as string value or None if not found

Return type:

str or None

usp_file(*, parameter_name: str | None = None, parameter: int | None = None) str

Get the input as an USP file and save it to disk.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter number of the input

Returns:

The path of the USP file

Return type:

str

value(*, parameter_name: str | None = None, parameter: int | None = None) str | None

Get the value as a string, either from the index or the name of the input.

Parameters:
  • parameter_name (str, optional) – Either None if not used or the name of the input

  • parameter (int, optional) – Either None if not used or the parameter of the input (1 or greater)

Returns:

The value as a string or None if not found

Return type:

Optional[str]