Data Synchronization

Data Synchronization uses MirrorMaker2 to configure this Kafka instance as a target cluster for replicating data from source Kafka clusters. MirrorMaker2 preserves topics, consumer groups, and offsets while maintaining partitioning during the replication process.

Prerequisites
  • The target Kafka cluster (this instance) should be empty before configuring replication to avoid conflicts.
  • Source and target Kafka clusters must be running
  • Network connectivity between clusters
  • Appropriate ACLs configured if security enabled

TOC

Configure Data Synchronization

CLI
Web Console

Turn on synchronization via CLI:

cat << EOF | kubectl -n default create -f -
apiVersion: middleware.alauda.io/v1
kind: RdsMirrorMaker2
metadata:
 labels:
   middleware.alauda.io/cluster: my-cluster
 name: rdsmm2-sample
spec:
 replicas: 2
 clusters:
   - bootstrapServers: source-cluster-kafka-bootstrap.yh-test:9093
 mirrors:
     - topicsPattern: ".*"
       groupsPattern: ".*"
       sourceConnector:
         tasksMax: 5
         config:
           replication.factor: '1'
           offset-syncs.topic.replication.factor: '1'
           sync.topic.acls.enabled: "false"
           refresh.topics.interval.seconds: '60'
           replication.policy.class: "org.apache.kafka.connect.mirror.IdentityReplicationPolicy"
       checkpointConnector:
         config:
           checkpoints.topic.replication.factor: '1'
           refresh.groups.interval.seconds: '600'
           sync.group.offsets.enabled: 'true'
           sync.group.offsets.interval.seconds: '60'
           emit.checkpoints.interval.seconds: '60'
           replication.policy.class: "org.apache.kafka.connect.mirror.IdentityReplicationPolicy"
 resources:
   limits:
     cpu: 1
     memory: 2Gi
   requests:
     cpu: 1
     memory: 2Gi

EOF

Monitoring Mirroring

After successful creation, you can monitor the data synchronization in the Web Console:

  1. Navigate to the Data synchronization tab of your Kafka instance
  2. View details in two sections:
    • MirrorMaker2 Configuration: Shows all configuration details including:
      • Source and target cluster information
      • Replication settings
      • Resource allocation
    • Topology: Displays the transmission status
  3. For detailed metrics:
    • Click the Monitoring button next to "Transmission status"
    • View real-time metrics in the monitoring dashboard
Important Notes
  1. Monitor replication lag metrics in Dashboard
  2. During upgrades, plan for brief replication pauses

Key Configuration Parameters

Source Connector Configuration

ParameterDescriptionDefault
replication.factorReplication factor for mirrored topics (-1 uses cluster default)-1
offset-syncs.topic.replication.factorReplication factor for offset sync topic (-1 uses cluster default)-1
sync.topic.acls.enabledWhether to sync ACLs (fixed to false when user-operator is enabled)false
refresh.topics.interval.secondsFrequency to check for new topics (in seconds)600
replication.policy.classReplication policy classorg.apache.kafka.connect.mirror.IdentityReplicationPolicy

Checkpoint Connector Configuration

ParameterDescriptionDefault
checkpoints.topic.replication.factorReplication factor for checkpoints topic3
refresh.groups.interval.secondsFrequency to check for new consumer groups (in seconds)600
sync.group.offsets.enabledWhether to sync consumer group offsetstrue
sync.group.offsets.interval.secondsFrequency to sync consumer group offsets (in seconds)60
emit.checkpoints.interval.secondsFrequency to emit offset tracking checkpoints (in seconds)60
Important Notes
  1. sync.topic.acls.enabled is forced to false when user-operator is enabled
  2. replication.policy.class uses IdentityReplicationPolicy to preserve original topic names
  3. -1 values for replication factors will use the cluster's default settings