Recover from an out-of-sync passive site

This describes the automatic and operational procedures necessary

This guide describes the procedures required to synchronize the secondary site with the primary site in a setup as outlined in Concepts for active-passive deployments together with the blueprints outlined in Building blocks active-passive deployments.

When to use procedure

Use this after a temporary disconnection between sites where Infinispan was disconnected and the contents of the caches are out-of-sync.

At the end of the procedure, the session contents on the secondary site have been discarded and replaced by the session contents of the primary site. All caches in the secondary site have been cleared to prevent invalid cached contents.

See the Multi-site deployments guide for different operational procedures.

Procedures

Infinispan Cluster

For the context of this guide, Site-A is the primary site and is active, and Site-B is the secondary site and is passive.

Network partitions may happen between the site and the replication between the Infinispan cluster will stop. These procedures bring both sites back in sync.

Transferring the full state may impact the Infinispan cluster performance by increasing the response time and/or resources usage.

The first procedure is to delete the stale data from the secondary site.

  1. Login into your secondary site.

  2. Shutdown Keycloak. This will clear all Keycloak caches, and it prevents the state of Keycloak from being out-of-sync with Infinispan.

    When deploying Keycloak using the Keycloak Operator, change the number of Keycloak instances in the Keycloak Custom Resource to 0.

  3. Connect into Infinispan Cluster using the Infinispan CLI tool:

    Command:
    kubectl -n keycloak exec -it pods/infinispan-0 -- ./bin/cli.sh --trustall --connect https://127.0.0.1:11222

    It asks for the username and password for the Infinispan cluster. Those credentials are the one set in the Deploy Infinispan for HA with the Infinispan Operator guide in the configuring credentials section.

    Output:
    Username: developer
    Password:
    [infinispan-0-29897@ISPN//containers/default]>
    The pod name depends on the cluster name defined in the Infinispan CR. The connection can be done with any pod in the Infinispan cluster.
  4. Disable the replication from secondary site to the primary site by running the following command. It prevents the clear request to reach the primary site and delete all the correct cached data.

    Command:
    site take-offline --all-caches --site=site-a
    Output:
    {
      "offlineClientSessions" : "ok",
      "authenticationSessions" : "ok",
      "sessions" : "ok",
      "clientSessions" : "ok",
      "work" : "ok",
      "offlineSessions" : "ok",
      "loginFailures" : "ok",
      "actionTokens" : "ok"
    }
  5. Check the replication status is offline.

    Command:
    site status --all-caches --site=site-a
    Output:
    {
      "status" : "offline"
    }

    If the status is not offline, repeat the previous step.

    Make sure the replication is offline otherwise the clear data will clear both sites.
  6. Clear all the cached data in secondary site using the following commands:

    Command:
    clearcache actionTokens
    clearcache authenticationSessions
    clearcache clientSessions
    clearcache loginFailures
    clearcache offlineClientSessions
    clearcache offlineSessions
    clearcache sessions
    clearcache work

    These commands do not print any output.

  7. Re-enable the cross-site replication from secondary site to the primary site.

    Command:
    site bring-online --all-caches --site=site-a
    Output:
    {
      "offlineClientSessions" : "ok",
      "authenticationSessions" : "ok",
      "sessions" : "ok",
      "clientSessions" : "ok",
      "work" : "ok",
      "offlineSessions" : "ok",
      "loginFailures" : "ok",
      "actionTokens" : "ok"
    }
  8. Check the replication status is online.

    Command:
    site status --all-caches --site=site-a
    Output:
    {
      "status" : "online"
    }

Now we are ready to transfer the state from the primary site to the secondary site.

  1. Login into your primary site

  2. Connect into Infinispan Cluster using the Infinispan CLI tool:

    Command:
    kubectl -n keycloak exec -it pods/infinispan-0 -- ./bin/cli.sh --trustall --connect https://127.0.0.1:11222

    It asks for the username and password for the Infinispan cluster. Those credentials are the one set in the Deploy Infinispan for HA with the Infinispan Operator guide in the configuring credentials section.

    Output:
    Username: developer
    Password:
    [infinispan-0-29897@ISPN//containers/default]>
    The pod name depends on the cluster name defined in the Infinispan CR. The connection can be done with any pod in the Infinispan cluster.
  3. Trigger the state transfer from the primary site to the secondary site.

    Command:
    site push-site-state --all-caches --site=site-b
    Output:
    {
      "offlineClientSessions" : "ok",
      "authenticationSessions" : "ok",
      "sessions" : "ok",
      "clientSessions" : "ok",
      "work" : "ok",
      "offlineSessions" : "ok",
      "loginFailures" : "ok",
      "actionTokens" : "ok"
    }
  4. Check the replication status is online for all caches.

    Command:
    site status --all-caches --site=site-b
    Output:
    {
      "status" : "online"
    }
  5. Wait for the state transfer to complete by checking the output of push-site-status command for all caches.

    Command:
    site push-site-status --cache=actionTokens
    site push-site-status --cache=authenticationSessions
    site push-site-status --cache=clientSessions
    site push-site-status --cache=loginFailures
    site push-site-status --cache=offlineClientSessions
    site push-site-status --cache=offlineSessions
    site push-site-status --cache=sessions
    site push-site-status --cache=work
    Output:
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }
    {
      "site-b" : "OK"
    }

    Check the table in this section for the Cross-Site Documentation for the possible status values.

    If an error is reported, repeat the state transfer for that specific cache.

    Command:
    site push-site-state --cache=<cache-name> --site=site-b
  6. Clear/reset the state transfer status with the following command

    Command:
    site clear-push-site-status --cache=actionTokens
    site clear-push-site-status --cache=authenticationSessions
    site clear-push-site-status --cache=clientSessions
    site clear-push-site-status --cache=loginFailures
    site clear-push-site-status --cache=offlineClientSessions
    site clear-push-site-status --cache=offlineSessions
    site clear-push-site-status --cache=sessions
    site clear-push-site-status --cache=work
    Output:
    "ok"
    "ok"
    "ok"
    "ok"
    "ok"
    "ok"
    "ok"
    "ok"

As now the state is available in the secondary site, Keycloak can be started again:

  1. Login into your secondary site.

  2. Startup Keycloak.

    When deploying Keycloak using the Keycloak Operator, change the number of Keycloak instances in the Keycloak Custom Resource to the original value.

AWS Aurora Database

No action required.

Route53

No action required.

Further reading

See Concepts to automate Infinispan CLI commands on how to automate Infinispan CLI commands.

On this page