Input Output

Code generated from the input- and output-definitions of a Studio workflow function. For a dataset input and an output the code editor would add:

@dataclass(kw_only=True)
class Inputs(BaseInputs):
    dsi: In.Dataset  # "dsi"

@dataclass(kw_only=True)
class Outputs(BaseOutputs):
   dso: Out.Dataset  # "dso"

The Inputs and Outputs classes will have all the inputs and outputs defined as attributes. The commented strings show the original names of the inputs and outputs as defined in the workflow function.

An example of a function using these generated classes:

def f(inputs: Inputs) -> Outputs:
  arrow_table = inputs.dsi.arrow()

  outputs = Outputs(
    dso = Out.Dataset(arrow_table)
  )

  return outputs

Here the input dsi is to create a pyarrow.Table using the .arrow() method and output dso is returned as a Studio Dataset using the Out.Dataset class.

The dataset input and output have an attribute multiple. Enabling this would generate In. and Out.Datasets:

@dataclass(kw_only=True)
class Inputs(BaseInputs):
    mdsi: In.Datasets  # "mdsi"

@dataclass(kw_only=True)
class Outputs(BaseOutputs):
   mdso: Out.Datasets  # "mdso"

Multi-input and outputs (also called Packages) can be used when a function need to produce or consume several files or datasets.

An example of a function using these generated classes:

def f(inputs: Inputs) -> Outputs:
  tables = []

  for item in inputs.mdsi.iter():
    inp = item.load()
    tables.append(inp.arrow())

  outputs = Outputs(
    mdso = tables
  )

  return outputs
class workflow_utils.input_output.BaseOutputTransport

Bases: object

Base class for output transport definitions used by the user function to upload files and datasets.

Example

The Output class can be reconfigured to inherit from BaseOutputTransport:

@dataclass(init=False)
class Outputs(BaseOutputTransport):
    out_file: Upload.File
    out_files: Upload.Files
    out_dataset: Upload.Dataset
    out_datasets: Upload.Datasets

Used like this:

@dataclass(kw_only=True)
class Inputs(BaseInputs):
    in_file: In.File
    in_dataset: In.Dataset

def f(inputs: Inputs, outputs: Outputs) -> None:
    outputs.out_file.upload(inputs.in_file)
    outputs.out_files.upload(inputs.in_file)
    outputs.out_dataset.upload(inputs.in_dataset)
    outputs.out_datasets.upload(inputs.in_dataset)

Note that outputs has to be initialized in main:

def main(wio: WorkflowIo):
    outputs = Outputs()
    outputs.init_members(wio)
    inputs = Inputs(**collect_inputs(input_cls=Inputs, io_instance=wio))
    f(inputs, outputs)
init_members(wio: WorkflowIo) None

Initialize the members of this class.

class workflow_utils.input_output.In

Bases: object

Class used by the Studio workflow builder when generating code from user input in the function authoring tool.

class Dataset(value: Table)

Bases: object

Class for passing a dataset value as argument to a Studio User Function.

arrow() Table

Use the pyarrow Table object.

dataset() Dataset

Used the workflow_utils Dataset.

pandas() DataFrame

Convert the dataset to a pandas DataFrame.

polars() DataFrame

Convert the dataset to a polars DataFrame.

value: Table
class Datasets(value: PackageInfo, presigned_base_url: str)

Bases: Package

Class generated for a Package Dataset Input.

Use the iter() method to iterate over the package content information represented by PackageItem. Use PackageItem.load() to load the dataset as an In.Dataset object like regular dataset inputs.

class Enum(value: str)

Bases: object

Class generated for an enum input.

value: the enum value as a string.

value: str
class File(file_name: str, content_type: str | None = None)

Bases: object

Class for passing a file value as argument to a Studio User Function.

Parameters:
  • file_name (str) – The path to the downloaded payload.

  • content_type (str) – The Content-Type header value or “application/octet-stream” if not set.

arrow(**kwargs: Any) Table

Parse the file content as a pyarrow Table.

bytes() bytes

Read a binary file and return the content as bytes.

content_type: str | None = None
dataset(**kwargs: Any) Dataset

Convert the dataset to a workflow_utils.dataset.Dataset.

file_name: str
json() Any

Read file as JSON and return the content as a dictionary.

pandas(**kwargs: Any) DataFrame

Parse the file content as a pandas DataFrame.

polars(**kwargs: Any) DataFrame

Parse the file content as a polars DataFrame.

text() str

Read file as text and return the content as a string.

class Files(value: PackageInfo, presigned_base_url: str)

Bases: Package

Class generated for a Package File Input.

Use the iter() method to iterate over the package content information represented by PackageItem. Use PackageItem.load() to load the dataset as an In.File object like regular file inputs.

class JSON(value: dict[Any, Any])

Bases: object

Class generated for a JSON input.

value: The JSON as a dictionary.

value: dict[Any, Any]
class Secret(secret: str)

Bases: object

Class generated for a secret input.

Use reveal() on this object to get the secret plain text.

reveal()

Reveal the secret. Use this function to pass the secret to the applicable credential input.

Returns:

The secret in clear text

class Stream(_parameter: int, _wio: WorkflowIo)

Bases: object

Class generated for a data stream input.

values(count: int | None = None) list[Message]

Retrieve messages from the data stream.

class workflow_utils.input_output.Out

Bases: object

Class generated to handle outputs from a workflow function.

class Arrow(value: DataFrame | DataFrame | Series | Series | Table | Dataset, content_type: str | None = None, name: str | None = None, prio: int | None = None)

Bases: File

Class for returning a dataframe value with apache-arrow ipc stream serialization as a Studio File.

content_type: str | None = 'application/vnd.apache.arrow.stream'
value: DataFrame | DataFrame | Series | Series | Table | Dataset
class Bytes(value: bytes, content_type: str | None = None, name: str | None = None, prio: int | None = None)

Bases: File

Class for returning an arbitrary byte value as a Studio File.

content_type: str | None = 'application/octet-stream'
value: bytes
class CSV(value: DataFrame | DataFrame | Series | Series | Table | Dataset, content_type: str | None = None, name: str | None = None, prio: int | None = None)

Bases: File

Class for returning a dataframe value serialized as CSV as a Studio File.

content_type: str | None = 'text/csv;charset=UTF-8'
value: DataFrame | DataFrame | Series | Series | Table | Dataset
class Dataset(value: DataFrame | DataFrame | Series | Series | Table | Dataset, name: str | None = None, prio: int | None = None)

Bases: object

Class for returning a dataframe value as a Studio Dataset.

Parameters:

name (str, optional) – The name of the uploaded dataset. If not set the output port name is used

For package item use:: :param prio: The prio of the uploaded package item, lower numbers are higher priority :type prio: int, optional

name: str | None = None
prio: int | None = None
value: DataFrame | DataFrame | Series | Series | Table | Dataset
class Datasets(values: list[Dataset | DataFrame | DataFrame | Series | Series | Table | Dataset])

Bases: object

Class for returning a number of dataframe values as a Studio Package.

Each Item can have additional info such as name, prio and position if values are passed as Out.Dataset with package_options: Out.PkgOpt member.

values: list[Dataset | DataFrame | DataFrame | Series | Series | Table | Dataset]
class File(value: DataFrame | DataFrame | Series | Series | Table | Dataset | str | bytes, content_type: str | None = None, name: str | None = None, prio: int | None = None)

Bases: object

Class for returning a value as Studio File.

Parameters:
  • value – The payload to upload. Can be dataset/dataframe reference, a file-name, or bytes.

  • value

  • content_type (str, optional) – When uploading files, the Content-Type header is set to this value

  • name (str, optional) – The name of the uploaded dataset. If not set the output port name is used

For package item use: :param prio: The prio of the uploaded package item, lower numbers are higher priority :type prio: int, optional

content_type: str | None = None
name: str | None = None
prio: int | None = None
value: DataFrame | DataFrame | Series | Series | Table | Dataset | str | bytes
class Files(values: list[File | DataFrame | DataFrame | Series | Series | Table | Dataset | str | bytes])

Bases: object

Class for returning a value as Studio File.

values: list[File | DataFrame | DataFrame | Series | Series | Table | Dataset | str | bytes]
class Stream(value: DataFrame | DataFrame | Series | Series | bytes | list)

Bases: object

Class for returning a dataframe value as a Studio Stream.

value: DataFrame | DataFrame | Series | Series | bytes | list
class Text(value: str, content_type: str | None = None, name: str | None = None, prio: int | None = None)

Bases: File

Class for returning a string value as an utf-8 encoded Studio File.

content_type: str | None = 'text/plain;charset=UTF-8'
value: str
class XLSX(value: DataFrame | DataFrame | Series | Series | Table | Dataset, content_type: str | None = None, name: str | None = None, prio: int | None = None)

Bases: File

Class for returning a dataframe value serialized as Excel (xlsx) as a Studio File.

content_type: str | None = 'application/vnd.openxmlformatsofficedocument.spreadsheetml.sheet'
value: DataFrame | DataFrame | Series | Series | Table | Dataset
class workflow_utils.input_output.Package(value: PackageInfo, presigned_base_url: str)

Bases: object

Class generated for a Package Input.

By contrast package outputs just take a list of things to add.

value: PackageInfo

iter() Iterator[PackageItem]
presigned_base_url: str
value: PackageInfo
class workflow_utils.input_output.PackageItem(*, order: int, position: int, prio: int | None, type: str, name: str, version_number: int, size: int, mime_type: str, _uri_rest: str, _uri_flight: str, _presigned_id: str, _item_id: str)

Bases: object

load() File | Dataset | None
mime_type: str
name: str
order: int
position: int
prio: int | None
size: int
type: str
version_number: int
class workflow_utils.input_output.PackageUploader(parameter: int, field_name: str, wio: workflow_utils.workflow_io.WorkflowIo)

Bases: object

field_name: str
parameter: int
upload_dataset(value: Any, name: str | None = None, prio: int | None = None, format: str | None = None) bool
upload_file(value: Any, name: str | None = None, prio: int | None = None, format: str | None = None) bool
wio: WorkflowIo
class workflow_utils.input_output.Upload

Bases: object

class Dataset(parameter: int, field_name: str, wio: workflow_utils.workflow_io.WorkflowIo)

Bases: PackageUploader

upload(value)
class Datasets(parameter: int, field_name: str, wio: workflow_utils.workflow_io.WorkflowIo)

Bases: PackageUploader

upload(value, name: str | None = None, prio: int | None = None)
class File(parameter: int, field_name: str, wio: workflow_utils.workflow_io.WorkflowIo)

Bases: PackageUploader

upload(value)
class Files(parameter: int, field_name: str, wio: workflow_utils.workflow_io.WorkflowIo)

Bases: PackageUploader

upload(value, name: str | None = None, prio: int | None = None)