Spark Application Templates

Spark application templates are used to define reusable configurations for Spark applications. When you have many applications with similar configurations, templates can help you avoid duplication by grouping common settings together. Application templates are available for the v1alpha1 version of the SparkApplication custom resource and share the exact same structure as the SparkApplication resource, but with some differences in the way the operator handles them:

  1. Application templates are namespace-scoped resources, just like Spark applications. This means that a SparkApplication can only reference templates from its own namespace.

  2. Application templates are not reconciled by the operator, but must be referenced from a SparkApplication resource to be applied. This means that changes to an application template will not automatically trigger updates to SparkApplication resources that reference it.

  3. An application can reference multiple application templates, and the settings from these templates will be merged together. The merging order of the templates is indicated by their index in the reference list. The application fields have the highest precedence and will override any conflicting settings from the templates. This allows you to have a base template with common settings and then override specific settings in the application resource as needed.

  4. Application template references are immutable in the sense that once applied to an application they cannot be changed again. Currently templates are applied upon the creation of the application, and any changes to the template references after that will be ignored.

  5. Application and template CRDs must have the exact same versions. Currently only v1alpha1 is supported.

Migrating from cluster-scoped templates

Application templates used to be cluster wide resources when they were first released. This was a mistake. Many users do not have the access rights to create cluster scoped resources and so the templates are now namespace scoped.

If you are migrating from older installations where templates were treated as cluster-wide resources, account for the following:

  1. Recreate each template in every namespace where SparkApplications use it.

  2. Keep template names consistent per namespace if you want the same application annotations to continue working.

  3. Cross-namespace template references are no longer resolved; templates and applications must be in the same namespace.

  4. Update GitOps/automation manifests to create templates as namespace-targeted resources before reconciling dependent SparkApplications. == Examples

Applications use metadata.annotations to reference application templates as shown below:

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: app
  annotations:
    spark-application.template.merge: "true" (1)
    spark-application.template.0.name: "app-template" (2)
    spark-application.template.upgradeStrategy: "onCreate" (3)
    spark-application.template.applyStrategy: "enforce" (4)
spec: (5)
  sparkImage:
    productVersion: "4.1.1"
  mode: cluster
  mainClass: com.example.Main
  mainApplicationFile: "/examples.jar"
1 Enable application template merging for this application.
2 Name of the application template to reference.
3 Optional. The upgrade strategy for the application template. Currently only onCreate is supported. This means that the application template will only be applied when the application is created, and any changes to the template after that will be ignored.
4 Optional. The apply strategy for the application template. Currently only enforce is supported. This means that any errors that appear during the application of the template will be treated as errors for the application resource, and the application will not be created or updated until the errors are resolved.
5 Application specification. The fields sparkImage, mode, mainClass, and mainApplicationFile are required for the application to be valid, but the rest of the fields are optional and can be defined in the application template.

The application template referenced in the example above is defined as follows:

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplicationTemplate (1)
metadata:
  name: app-template (2)
spec:
  sparkImage:
    productVersion: "4.1.1"
    pullPolicy: IfNotPresent
  mode: cluster
  mainClass: com.example.Main
  mainApplicationFile: "placeholder" (3)
  sparkConf:
    spark.kubernetes.file.upload.path: "s3a://my-bucket"
  s3connection:
    reference: spark-history-s3-connection
  logFileDirectory:
    s3:
      prefix: eventlogs/
      bucket:
        reference: spark-history-s3-bucket
  driver:
    config:
      logging:
        enableVectorAgent: False
  executor:
    replicas: 1
    config:
      logging:
        enableVectorAgent: False
1 The kind of the resource is SparkApplicationTemplate to indicate that this is an application template.
2 Name of the application template.
3 The value of mainApplicationFile is set to a placeholder value, which will be overridden by the application resource. Similarly to the application, The fields sparkImage, mode, mainClass, and mainApplicationFile are required for the template to be valid.

An application can reference multiple application templates as shown below:

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: app
  annotations:
    spark-application.template.merge: "true" (1)
    spark-application.template.0.name: "app-template-0" (2)
    spark-application.template.1.name: "app-template-1"
    spark-application.template.2.name: "app-template-2"
spec: (3)
  sparkImage:
    productVersion: "4.1.1"
  mode: cluster
  mainClass: com.example.Main
  mainApplicationFile: "/examples.jar"
1 Enable application template merging for this application.
2 The name of the application templates to reference. The settings from these templates will be merged together in the order they are referenced, with app-template-0 having the lowest precedence and app-template-2 having the highest precedence. The application fields have the highest overall precedence and will override any conflicting settings from the templates.