© 2018-2024 mod_cluster contributors
1. Developer Resources
These resources were converted from the read-only developer.jboss.org which is being retired thus are out-of-date on various topics. We’re working on making them up to date. |
2. Design
2.1. ModClusterDesign
2.1.1. Next Generation Web Tier Load Balancing Design
Design document for the next generation JBoss AS web tier load balancing architecture.
The Initial document is derived from discussions held in Neuchâtel on December 6, 2007. Participants were Bela Ban, Frederic-Clere, Jason Greene, Sacha Labourey, Mircea Markus, Remy Maucherat, Brian Stansberry, Manik Surtani, Mladen Turk, Jimmy Wilson and Galder Zamarreno.
2.1.2. Key Design Goals
-
Dynamic registration of AS instances and context mountings; no need to statically configure the JBossWeb "workers" or context mountings on the Apache httpd side.
-
Cluster-wide load balance calculations maintained on the AS side, with appropriate load factors sent to the httpd side as circumstances change.
-
Pluggable policies for calculating the load balance factors.
-
-
AS instances send lifecycle notifications to the httpd side, equivalent to the "disable('D')" and "stop ('S')" values available with mod_proxy Parameter definition (See http://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypass).
-
More fine-grained; individual contexts can be disabled/stopped, not just entire server instances.
-
2.1.3. Basic Architecture
A new Apache module, "mod_cluster" will be created, based on the existing mod_proxy module. See JBNATIVE-53 for details.
Normal web request traffic will be sent from mod_cluster to the JBossWeb instances' AJP connector using the AJP protocol. No changes to the AJP protocol are necessary.
On the AS side, a new ModClusterService will be created. It will send cluster configuration and load balance weighting information to the httpd side via HTTP/HTTPS.
The ModClusterService will be made aware of how to contact the httpd side via static configuration; there will be no dynamic discovery of httpd instances. (The list of httpd instances could, however, be changed at runtime via a management tool.)
2.1.4. Modes
We discussed two possible modes in which the ModClusterService could operate:
"Non-Clustered" AS instances
Here the AS instances do not exchange information amongst themselves. Think in terms of a group of AS instances running the "default" config, with no JGroups channel open.
In this mode, each AS instance directly communicates with each server on the Apache httpd side. The ModClusterService would be able to send messages to the httpd side regarding:
-
Registration and configuration information
-
Mounting information
-
Lifecycle events (disable/stop the instance or one of its contexts)
-
However, any load balance factor sent from the AS instances would be a static configuration value, equivalent to the worker.xxx.lbfactor in a mod_jk workers.properties file
-
Minor "nice-to-have" is the ability for the load balance factor to be updated at runtime (e.g. via a management tool) with the new value passed to the httpd side.
-
A related "nice-to-have" is the ability for a node to update it’s load balance factor itself. For example, if the node feels it is overloaded, it can reduce it’s factor. If all nodes did that simultaneously, it would have no effect, which is OK.
-
Question: In this mode, can mod_cluster still make load balance decisions a la the mod_jk Request, Session and Traffic methods? Or would the load balance factor from the static configuration be the sole factor in the load balance decision?
Answer: The load balance factor would be the sole factor in the decision; don’t want to maintain load balance policy code in mod_cluster. See above "nice-to-have" on allowing each node to dynamically update its load balance factor to reflect its own appraisal of its status.
"Clustered" AS instances
The AS instances would form a cluster (using JGroups) and could thus exchange various metrics that would be used in the load balance factor calculation. ModClusterService would include an HASingleton component such that one member of the cluster would be responsible for collating these metrics, deriving the load balance factor for all group members, and sending that consolidated information to each server on the Apache httpd side.
Question: In this mode, would messages other than load balance factors be transmitted via the HASingleon member? My (BES) initial take on this is it seems simpler to restrict the HASingleton messages to load balance information.
Answer: That is probably enough. But it could interesting to see if a node of the cluster could "ping" htttpd.
Normal Request Handling
Normal request traffic is passed from mod_cluster to the AJP connector in the normal, fashion; i.e. via AJP over a pool of long-lasting connections.
2.1.5. AS zo mod_cluster Communications – ModCluster Managment Protocol (MCMP)
The specification of the communication protocol between the ModClusterService and httpd is the main detail area that needs to be hammered out between the AS clustering team and the mod_cluster side. Once that is done each side can proceed fairly independently.
Communication from ModClusterService to httpd will be done via HTTP or HTTPS. There is ongoing discussion of what HTTP method is most appropriate. GET can be fairly human-readable and is easiest to use via a CLI (e.g. telnet). But, GET requests are limited by the 8 Kbytes4 bytes URL length limitation, and even it is unlikely that some requests would need to go beyond that length, POST seems like a reasonable choice and there is currently discussion of using a WebDAV-like approach where we define custom request types (corresponding to the message types below). It is not easy to pass parameter values as HTTP headers rather because httpd logic will optimize them (app: /myapp is transform while parsing into app: /myapp, /hisapp).
This communication would be over the regular port that httpd is listening on,
with the request being internally handled by mod_cluster_manager based on a
special context path mount in httpd.conf. Basically, something like
SetHandler mod-cluster
. What port to use depends on how httpd is configured.
Issue: Securing this mount is a bit trickier than the jkstatus case, which could often just be set to only "Allow from: 127.0.0.1". Such a simple approach won’t work here as a large number of AS instances will need to be able to communicate.
Solution: It is possible to have the SetHandler directive in a VirtualHost where SSL is mandatory.
Example of Secure SetHandler:
Listen 9443
<VirtualHost _default_:9443>
SSLEngine on
SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
SSLCertificateFile conf/server.crt
SSLCertificateKeyFile conf/server.key
SSLCACertificateFile conf/server-ca.crt
SSLVerifyClient require
SSLVerifyDepth 10
SetHandler mod-cluster
</VirtualHost>
Basic categories of messages:
Configuration Information
Per node. Initially provided by each node (or perhaps by the HASingleton) during the startup process for the node.
The "Connection Directives" and "Advanced Worker Directives" sections of the Apache Tomcat Connector - Reference Guide give a good description of the various options supported by mod_jk.
Question/TODO: Which if any of these are not available, given that the code base is mod_proxy not mod_jk?
Answer: See node configuration for a proposal for that stuff.
Other configuration items mentioned in the Neuchatel discussions that are notdirectly mentioned in the reference guide:
-
Authentication information (not sure what was meant here)
-
Max sessions
Question: My assumption is we’ll support updating these values after the initial registration of a worker.
Answer: Those value will be stored in shared memory and should be used while processing new connections and new requests.
Load Balancing Factors
Either a single load balance factor (in "non-clustered" mode) or a set of factors (in "clustered" mode).
General Load Balancing Configurations
Things like the mod_jk 'sticky-session' and 'sticky-session-force' directives. See the "Load Balancing Directives section in the Apache Tomcat Connector - Reference Guide for others.
Issue: If the ModClusterService is operating in "non-clustered" mode, it isn’t clear who configures these.
Answer: In "non-clustered" mode, each AS instance will independently send this information, with any new data overriding the older. It is the responsibility of the user to ensure that each AS instance has the same configuration for these global values.
Management Message Types
Requests from ModClusterService notify the httpd side of lifecycle events: startup/shutdown of JBossWeb instances; deploy/undeploy of webapps.
Requests are sent via HTTP/HTTPS (80, 443); the exact HTTP request method is a subject of ongoing discussion.
-
CONFIG: Send configuration information for a node or set of nodes.
-
ENABLE-APP: Send requests and assign new sessions to the specified app. Use of to identify the app means enable all apps on the given node.
-
DISABLE-APP: Apache should not create new session for this webapp, but still continue serving existing session on this node. Use of to identify the app means disable all apps on the given node.
-
STOP-APP: New requests for this webapp should not be sent to this node. Use of to identify the app means stop all apps on the given node.
-
REMOVE-APP: No requests for this webapp should be sent to this node. Use of to identify the app means the node has been removed from the cluster. In this case all other configuration information for the node will be removed and any open connection between httpd and the node will be closed.
-
STATUS: Send the current load balance factor for this node (or a set of nodes). Periodically sent. mod_cluster_manager responds with a STATUS-RSP. Interesting suggestion is to support sending a different load balance factor per webapp.
-
INFO: Request configuration info from mod_cluster_manager. Response would include information on what virtual hosts are configured (so per-webapp commands can specify the correct virtual host) and other info that ModClusterService can make available to management tools (e.g. what addresses/ports httpd is listening on.) mod_cluster_manager responds with a INFO-RSP message.
-
DUMP: Request a text dump of the current configuration seen by mod_cluster_manager. mod_cluster_manager responds with a DUMP-RSP containing a raw ascii text corresponding to the current configuration.
-
PING: Request check the availability of a httpd or a cluster nodes from httpd (using the node name (JVMRoute or Scheme, Host and Port). mod_cluster_manager will respond with a PING-RSP which have a similar format to STATUS-RSP. (Since version 0.0.1 of the protocol).
Previous iteration also had ENABLE/DISABLE/STOP commands that applied to all apps on a node. This usage can be handled by passing '' as the webapp name. A STOP message may still be useful as a signal to mod_cluster_manager to completely remove all configuration information for a node from memory. Perhaps a different name than STOP, e.g. REMOVE.
A detailed protocol proposal could be found in Mod-Cluster_Management_Protocol.
Responses to the above requests will contain something like:
-
"HTTP/1.1 200 OK" When command has been processed correctly.
-
"HTTP/1.1 500 VERSION 1.2.3" if something about the request was not understood. Version number helping the ModClusterService understand how to tailor future requests.
-
"HTTP/1.1 200 OK" and the response for the request (for STATUS and DUMP requests at least).
It could interesting to have the following for some requests:
-
List any any exclusion nodes (nodes that mod_cluster regards as failed due to problems responding to requests)
-
Metrics (open connections, number of retries, etc) that ModClusterService may wish to use in load balancing calculations.
Virtual Hosts
Messages pertaining to particular webapps will need to qualify the webapp’s
context name with virtual host information. This virtual host information
needs to be in terms httpd can understand rather than in the terms JBossWeb
uses. E.g., if httpd has a virtual host labs.jboss.org and JBossWeb has a
server.xml
host element named "labs", the communication to
mod_cluster_manager must qualify the relevant webapps with "labs.jboss.org".
The purpose of the INFO message is to acquire the necessary information to understand the virtual hosts on the httpd side. ModClusterService will need to analyze the names and aliases of the Host instances running in JBossWeb and correlate them to the appropriate httpd virtual hosts.
2.1.6. ModClusterService Design
The ModClusterService will be based on a modular architecture, with as many points as possible pluggable and extendable. Major components include:
-
A pluggable adapter for interfacing with the mod_cluster_manager. The details of the interaction (POST vs GET vs WebDAV like commands, even whether mod_cluster_manager is the load balancer) should be completely abstracted away from the rest of the service.
-
Group communication module for coordinating gathering of metrics, managing the HASingleton, etc.
-
Metrics gathering module, for gathering needed metrics from the local node. Likely will include pluggable submodules for interfacing with various AS subsystems (e.g. JBossWeb for web tier usage statistics, transaction subsystem, general core server metrics like CPU and memory usage, etc.).
-
Load balancing manager for coordination of metrics gathering.
-
Load balance policy which calculates the current load balance factors.
-
Configuration module for determining information about the runtime environment, e.g. what port the AJP connector is listening on, what Tomcat Host instances are running, etc. Perhaps this module will read a configuration file for other ModClusterService-specific static information, although my general preference would be to configure that sort of thing via -beans.xml property injection.
-
Management module for exposing an interface to external management tools.
2.1.7. Clustering Issues
Domains
We want full support for domains.
A domain is a way to group nodes that share sessions.
However, there are a couple different ways users might implement these; we need to think through how to handle both. In both cases a JGroups channel is used for session replication, with group membership limited to the members of the domain. The question is how the JGroups channel used for intra-cluster ModClusterService traffic is set up:
-
The channel includes all members. In this case, there is one HASingleton which manages things for all domains.
-
There is a channel per domain, in which case there are multiple HASingleton instances, one per domain.
The former seems pretty simple, and can generate more accurate load balancing factors, but the latter is probably preferable for users to configure. To support the latter, we need to ensure the message protocol doesn’t result in messages from one domain accidentally affecting another domain. For example:
-
An HASingleton sends a CONFIG message with data for a set of nodes. mod_cluster_manager should not treat the absence of a particular node from the message as meaning that node should be dropped from memory. Rather, once a node is configured it should require a specific message to remove it.
-
Same thing for load balance factors. If a message is received that says A has factor 2, that remains A’s factor until specifically changed. A STATUS message changing B, C and D’s factor with no mention of A doesn’t somehow set A to 0.
Split-Brain Syndrome
Problem here is if there is a network partition disrupting intra-cluster JGroups traffic. Assume traffic between the httpd boxes and the AS instances is unaffected. This will result in a situation where more than one HASingleton will be running, with each feeling the nodes in the other subcluster have died. We need to avoid a situation where each HASingleton tells mod_cluster_manager to stop sending traffic to the other subcluster’s nodes, with the effect that no nodes are available.
Perhaps the way to deal with this is by having the HASingleton send a STATUS or some other message to mod_cluster_manager before handling what it sees as a node failure. If mod_cluster_manager regards the node as still being healthy, the singleton can regard this as a sign of a split-brain condition and defer telling mod_cluster_manager to remove the node.
2.1.8. Use cases
-
JBoss AS is started
-
Send CONFIG message to httpd, httpd adds information to internal tables, but does not yet connect to JBoss via AJP
-
CONFIG contains
-
Contents of workers.properties: IP address and port of JBoss
-
uriworkermap.properties
-
-
Changes to JBoss config are also sent via CONFIG, overwrites the existing entry at httpd
-
Apache does not yet connect
-
Send ENABLE-APP (with list of all deployed webapps) to httpd
-
This would happen at the end of the startup phase, after the JBossWeb connectors are started. Need an internal notification to know when the connectors are started.
-
-
-
Webapp is deployed on a started JBoss AS
-
Send ENABLE-APP to Apache
-
Apache adds webapp to its table and forwards requests to one of the JBoss instances which host this webapp
-
Tables need to maintain information about webapps like stopped, started, enabled, disabled etc
-
If we support different load balance factors per webapp, a CONFIG message with the initial factor would need to be sent before the ENABLE-APP
-
-
Webapp is undeployed
-
(Possibly) send DISABLE-APP to Apache, Apache disables the app in its tables:
-
Requests with existing sessions are still sent to the node
-
Maybe wait until all sessions are drained
-
More sophisticated things can be done as well, such as waiting until no requests have come in within a configurable or dynamically determined period of time (e.g. 15 secs). Idea is to allow the webapp to be stopped on the node as soon as it is reasonable to assume any previous requests' session state has been replicated.
-
-
-
Send STOP-APP to Apache
-
Apache removes webapp from its tables
-
-
JBoss is stopped (gracefully)
-
(Possibly) send DISABLE-APP with a '' parameter to Apache, Apache disables all apps for the node in its tables
-
Requests with existing sessions are still sent to the node
-
Maybe wait until all sessions are drained
-
More sophisticated things can be done as well, such as waiting until no requests have come in within a configurable or dynamically determined period of time (e.g. 15 secs). Idea is to allow the webapp to be stopped on the node as soon as it is reasonable to assume any previous requests' session state has been replicated.
-
-
-
Send STOP-APP with a '' parameter to Apache, Apache stops all apps for the node in its tables
-
Issue: The above causes mod_cluster to stop routing requests to the node, but it still maintains all configuration information for the node in memory. Perhaps an additional STOP or REMOVE command is needed to signal mod_cluster to remove all configuration information.
-
-
JBoss sends load (STATUS) information to Apache
-
Sent regularly, in configurable intervals.
-
Either single or clustered: multiple or one value, e.g. for multiple: A:1, B:4, C:2, D:4. Same as load balance type 'R' currently
-
Response is used to get info from httpd
-
workers mod_cluster sees as being in error state
-
If ModClusterService doesn’t believe a listed worker has failed, it can send messages to mod_cluster telling it to try to recover the worker (see below).
-
-
any httpd-side metrics being tracked by the AS for management or load balancing purposes
-
-
If in non-clustered mode and we don’t send dynamic load information, we can also simply not send this message
-
Issue: If we don’t send a STATUS message and mod_cluster regards a node as being in error state, the node will never know that and will never try to recover itself. As a solution 1) we could have each node periodically send a STATUS to avoid this, or 2) perhaps mod_cluster could do what mod_jk does, and run a background thread that tries to resurrect nodes in error state.
-
-
-
JBoss crashes
-
Two possible mechanisms for detecting the problem:
-
If clustered, the HA Singleton may detect the crashed node via the JGroups failure detection protocols.
-
Whether the JBoss AS nodes are clustered or not, mod_cluster may detect the failed node before JGroups does (e.g. via CPING/CPONG). mod_cluster marks the worker as being in error state.
-
In clustered mode, in the response to its next STATUS request the HA Singleton will be made aware of the fact that mod_cluster sees the node as failed
-
If ModClusterService still doesn’t see the node as failed (i.e., JGroups FD/VERIFY_SUSPECT timeouts have not elapsed), it will send another CONFIG and set the node status to UP to mod_cluster_manager
-
mod_cluster will attempt to use the node again, and will fail
-
process repeats until JGroups detects the node failure
-
-
-
No matter which of the above paths is followed, once ModClusterService regards the node as failed it sends STOP-APP with a '' parameter to Apache, Apache stops all apps for the node in its tables
-
As in 4 above, we need a mechanism for telling Apache to remove the worker config from memory
-
When the JBoss instance comes back up, it’ll go through use case 1: a CONFIG message is sent which adds the nodes configuration, then ENABLE-APP to signal that requests can be sent.
-
-
JBoss instance hangs
-
Very similar to 5 above, only difference is it is possible the node will recover before JGroups removes it from the group.
-
Either way, when instance has rejoined the cluster, we will need to send another CONFIG to Apache, so Apache adds the JBoss instance to its tables
-
-
Connectivity is lost between mod_cluster and a node (non-clustered case)
-
This is conceptually similar to 7 above. mod_cluster cannot successfully connect to an AS instance, so it adds it to its error table.
-
If we decide that the non-clustered nodes will periodically send STATUS messages, the node will learn it is in the error list and try to recover itself (new CONFIG + ENABLE-APP)
-
Otherwise, a background process on the httpd side will need to periodically try to recover the node
-
-
Connectivity is lost between mod_cluster and a node (clustered case, node is not HASingleton)
-
Similar to 8 above. Here the HASingleton will for sure periodically send STATUS messages and will send a new CONFIG + ENABLE-APP to try to recover the node.
-
Remark: Without an asynchronous ping/pong this will cause QoS problems: The node will be marked UP and mod_cluster will forward new requests to it if the connectivity between mod_cluster and the node is still lost all those requests will timeout on connect and fall back to another node. See this document.
-
-
Connectivity is lost between mod_cluster and a node (node is the HASingleton)
-
Tricky situation, as the singleton is basically non-functional if it cannot talk to the httpd side. This will need to be handled with an extension to the normal HASingleton handling whereby a master can force an election of a new singleton master if it detects it cannot contact httpd, with the election policy ensuring the problem node is not elected.
-
This perhaps can be done by storing a Boolean in the DRM for each node (rather than the usual meaningless String). The boolean indicates whether the node can send to mod_cluster_manager; election policy excludes nodes with 'false'. Node updates the DRM with a boolean 'false' when it detects a problem; this update should trigger a new election.
-
-
Perhaps we need a PING message that each node can use to check its ability to send to mod_cluster_manager?
-
2.1.9. Misc
-
All requests to apache are sent to the apache default port (80), or whichever port is configured
-
There’s a dummy app (like 'status') which processes those requests (provided by mod-cluster-manager)
-
The AJP configuration can be completely set with a CONFIG message from the JBoss side
-
Anything that can be configured via the existing mod_jk 'status' webapp can be configured via MCMP
2.1.10. Other Considerations
-
It would be nice the an implementation of the ModClusterService could be deployed in AS 4.x.
-
Any code that interacts with JBossWeb to gather metrics would need to be pluggable to support any interface differences.
-
JGroups or HAPartition usage could be different, as could HASingleton usage.
-
So, a "nice-to-have".
-
-
Where should the code live. Who will use it (see issue above), what will the dependencies be, etc.
2.2. ModCluster ping pong
The ping/pong is AJP feature that allows to check the availability of a node quickly. It would be possible to add the same feature to the http/https (There already some work done in the ASF httpd dev-list).
It is possible to have 2 kind of ping/pong:
-
a synchronous ping/pong that corresponds to a user request.
-
a asynchronous one that corresponds to a health-check mechanism.
A synchronous prevents broken connection due to timeout bad firewall etc. An asynchronous checks if a node is up at the time of the check.
2.2.1. Why do we need an asynchronous ping/pong.
If a node is connected to 2 Ethernet boards it could happen that the connection between httpd and the node is broken but not the connection to the rest of the cluster.
So a STATUS message from the cluster would put the node back in the UP state with a good load factor (as it is not used). So all the new incoming requests will go to that node and the connection will fail and mod_cluster will have to fail over to another node. That is very bad for the quality of service.
If we do a ping/pong before setting the node to the UP state we could easy prevent this.
2.2.2. Other way to get the feature:
There are some other way to get the feature.
Using a STATUS message from the node to httpd:
To check for a broken connection it is possible to send a STATUS message from the node to httpd. If httpd answers the connection is back.
If not the cluster should think that the node is still DOWN.
Using mod_rewrite and http Connect:
It needs some configuration in httpd something like:
ProxyPass / balancer://cluster stickysession=;jsessionid
NameVirtualHost *:8000
# Virtual host for the cluster manager
<VirtualHost *:8000>
RewriteEngine On
RewriteRule /ping/(.*) /*;jsessionid=0.$1 [PT]
</VirtualHost>
A simple request on the http for node like:
GET ping/node1 HTTP/1.0
Ping: On
"Ping: On" is custom header from administrator app that will cause balancer to select node1 and create a request and 200 if the request to node1 is successful in this case the administrator application can then inform the cluster that node1 state can be changed to ENABLE
2.3. ModCluster node conf
mod_cluster first spec documentation is mod_jk oriented in the node description mod-cluster document.
mod_cluster should be based on mod_proxy and logic to guess the cluster configuration is not needed. Let’s going through the worker ajp13 parameter and see what to do.
The worker parameters are listed on http://tomcat.apache.org/connectors-doc/reference/workers.html only the ajp13-type worker directives make sense for us.
In bold The parameters will be added to mod_jk existing worker parameters.
2.3.1. General node definition:
-
host that is host of BalancerMember http://host:port
-
port that is port pf BalancerMember http://host:port
-
type that is the scheme used ajp, http or https (type already exists in mod_jk but that is not the scheme and only ajp scheme was supported).
-
route that is the JvmRoute. In mod_jk that is the name of the node.
-
socket_timeout that could be ProxyTimeout or timeout of proxy backend parameters.
-
socket_keepalive that is keepalive of proxy backend parameters.
2.3.2. pool/opened connection directives.
The mod_proxy directives are more accurate for mod_cluster than the mod_jk ones:
-
connection_pool_size that is max of proxy backend parameters.
-
connection_pool_minsize that is min of proxy backend parameters.
-
connection_pool_timeout that is ttl of proxy backend parameters
We will use:
-
min
Minumum number of connections that will always be kept open to the backend server. -
max
We probably shouldn’t document it because it often causes problems at customer. -
smax
Soft Maximum number of connections will be created on demand. Any connections above smax are subject to a time tottl
. -
acquire
Maximum time to wait for a free connection in the connection pool. If there are no free connections in the pool the httpd will reject the connection. -
ttl
Time To Live for the inactive connections above the smax connections in seconds. httpd will close all connections that have not been used inside that time period.
2.3.3. timeout directives
-
connect_timeout that is timeout of proxy backend parameters (Defaut to ProxyTimeout and then to server TimeOut directive).
-
prepost_timeout that is ping of proxy backend parameter.
-
reply_timeout like connect_timeout.
2.3.4. error handling directives
-
retries NOT USED.
-
recovery_options NOT USED.
-
fail_on_status NOT USED.
-
recovery_options NOT USED.
-
fail_on_status NOT USED.
Existing mod_proxy code will be changed to allow one retry if ping/pong it will use the next node if ping/pong still fails. 503 be returned if the request can’t be forwarded to any nodes.
2.3.5. packet size definition
max_packet_size that is iobuffersize of proxy backend parameter.
2.3.6. other directives:
-
max_reply_timeouts NOT USED.
-
secret NOT USED that doesn’t increase the security of the connection only SSL would be a safe option.
2.3.7. routing directives:
-
domain The logic has to be added in mod_cluster code (See CONFIG in ModCluster Management Protocol)
-
distance NOT USED. The logic in the cluster to calculate load factor the should allow that.
-
redirect that will be the redirect proxy backend parameters (failover node). Even if the logic in the cluster could trigger it via the load factor in a domain that is useful feature.
-
sticky_session whether sticky sessions should be used
-
sticky_session_force whether failover should be disallowed once a session is established
-
sticky_id That is the stickysession mod_proxy that allow to change the name of parameter or the cookie containing the sessionid information.
2.4. ModCluster ReverseExtension
There is a "Reverse Connection Function in JBoss AS" feature request. The concept is to get the AJP/http/https connections created in Tomcat instead in httpd.
The first idea is to implement it is to add a feature in mod_cluster to fill the worker connection pool with the sockets resulting of an accept() on the httpd side.
Of course the Tomcat connector needs to be modified too.
2.4.1. How does that work?
When a CONFIG containing "Reversed" = "yes" is received the mod_cluster logic will listen to a connection from Tomcat instead try to connect to it.
Due to multiprocess logic in httpd a range of port needs to be use. It is the responsibility of the node configuration to provide a port number that won’t collide with any other listener in the httpd box. The port and local address are provided in the CONFIG message by the ModClusterService. (via Host: and Port:)
This will work only with not (or limited) forked httpd mpm models.
2.4.2. Trying to work-around the multi-process problems:
Problem several processes. Use a range of port: (start to start+n where is properties -D global.bla=n (n number) in AprEndpoint.Acceptor:
Socket.connect(socket, serverAddress);
Well just change the serverAddress to use different port. (easy no?).
The IP and starting port come from the Connector in server.xml:
port="$" address="$"
Let’s see how it works:
-
when a worker is created it binds on one available port in the range. It stays binded for all its live time.
On the TC side the poller notes it is running of connections so it will open new connections to httpd.
-
TC logic should distribute the connections on all the ports.
Notes:
-
the connections are created with keep-alive to prevent firewall problems
-
the socket timeout is also set to a reasonable value.
-
size of receive buffer (APR_SO_RCVBUF) is set.
-
APR_TCP_NODELAY is also set (like the best connection parameters mod_proxy is normaly using).
-
when httpd process finishes the connections are closed… So the poller in
TC will note it. It will try to connect and will reach the newly created httpd process.
In the proxy_worker struct there is an: |
void opaque; / per scheme worker data /
The local socket we are bind to is stored there. The accept() is done using it.
3. Implementation details
3.1. ModCluster Management Protocol (MCMP)
This document describes elements of the ModCluster Management Protocol for communication between a container (AS) and a load balancer (Apache httpd / Undertow).
3.1.1. Messages
Message type / Request method | Description |
---|---|
CONFIG |
Send configuration information for a node or set of nodes. |
INFO |
Request configuration info from mod_cluster-manager. |
DUMP |
Request a text dump of the current configuration seen by mod_cluster_manager. |
STATUS |
Send the current load balance factor for this node (or a set of nodes). Periodically sent. |
PING |
Request a ping to httpd or node. The node could defined by JVMRoute or Scheme, Host, and Port (since version 0.1.0). |
VERSION |
Request information about the software version used and supported MCMP version |
ENABLE-APP |
Send requests and assign new sessions to the specified app. Use of to identify the app means enable all apps on the given node. |
DISABLE-APP |
Apache should not create new session for this webapp, but still continue serving existing session on this node. Use of to identify the app means disable all apps on the given node. |
STOP-APP |
New requests for this webapp should not be sent to this node. Use of to identify the app means stop all apps on the given node. |
REMOVE-APP |
Remove the information about this webapp from mod_cluster tables. |
Message payload
The information is in ASCII and URL encoded if needed.
All numbers are integers, string representation is used.
The command is a string in the first part of a message followed by / HTTP/1.0<CR><LF>
.
Example:
DISABLE-APP / HTTP/1.0<CR><LF>
All parameters are send as <PARAMETER_NAME>=<VALUE> and the separator between their pairs is <&>
<SPACE> in value is escaped as <+>
<&> in value is escaped as &26
<=> in value is escaped %3D
<CR><LF> means end of <COMMAND>
For certain values you can specify a list, in such cases separate the individual values by commas
(e.g., Alias=localhost,demo
).
All parameter names are case insensitive, but their values are NOT. |
Example:
DISABLE-APP / HTTP/1.0<CR><LF>
Content-length: 44<CR><LF>
<CR><LF>
Jvmroute=node1&Context=/myapp&Alias=demo<CR><LF>
You can send the command using curl
executing following:
curl -XDISABLE-APP -d "JVMRoute=node1" -d "Context=/myapp" -d "Alias=demo" http://localhost:6666
3.1.2. Message Types
Following are the descriptions of individual message types with their parameters.
CONFIG
This message adds a new node or updates an existing one.
Key | Description | Default | Required |
---|---|---|---|
Alias |
List of virtual hosts (See Alias) |
||
Balancer |
is the name of the balancer in httpd (for example mycluster in ProxyPass /myapp balancer://mycluster/) |
mycluster |
|
Context |
List the context the virtual host list supports like (Ex. "/myapp,/ourapp") |
||
Domain |
is the domain corresponding to the node |
||
Host |
is the IP address (or hostname) where the node is going to receive requests from httpd |
localhost |
|
JVMRoute |
Is what is after the . in the JSESSIONID cookie or the parameter jsessionid |
Yes |
|
Port |
is the port on which the node except to receive requests |
8009 |
|
Type |
http/https/ajp The protocol to use between httpd and AS to process requests |
ajp |
|
Reversed |
Reverse connection |
no |
|
Flushpackets |
Enables/disables packet flushing (values: on/off/auto) |
off |
|
Flushwait |
Time to wait before flushing packets in milliseconds. A value of -1 means wait forever |
PROXY_FLUSH_WAIT |
|
Ping |
Time (in seconds) in which to wait for a pong answer to a ping |
10 seconds |
|
Smax |
Soft maximum idle connection count (that is the smax in worker mod_proxy documentation). The maximum value depends on the httpd thread configuration (ThreadsPerChild or 1) |
-1 |
|
Ttl |
Time to live (in seconds) for idle connections above smax (in seconds) |
60 seconds |
|
Timeout |
Timeout (in seconds) for proxy connections to a node. That is the time mod_cluster will wait for the back-end response before returning error. That corresponds to timeout in the worker mod_proxy documentation. A value of -1 indicates no timeout. Note that mod_cluster always uses a cping/cpong before forwarding a request and the connectiontimeout value used by mod_cluster is the ping value |
0 |
|
StickySession |
Indicates whether subsequent requests for a given session should be routed to the same node, if possible |
yes |
|
StickySessionCookie |
Value of the Session cookie |
JSESSIONID |
|
StickySessionPath |
Value of the Session path |
jsessionid |
|
StickySessionRemove |
Indicates whether the httpd proxy should remove session stickiness in the event that the balancer is unable to route a request to the node to which it is stuck. Ignored if stickySession is false |
no |
|
StickySessionForce |
Indicates whether the httpd proxy should return an error in the event that the balancer is unable to route a request to the node to which it is stuck. Ignored if stickySession is false |
yes |
|
WaitWorker |
Number of seconds to wait for a worker to become available to handle a request. When no workers of a balancer are usable, mod_cluster will retry after a while (workerTimeout/100). That is timeout in the balancer mod_proxy documentation. A value of -1 indicates that the httpd will not wait for a worker to be available and will return an error if none is available (in seconds) |
0 |
|
MaxAttempts |
Maximum number of failover attempts before giving up. The minimum value is 0, i.e. no failover. The default value is 1, i.e. do a one failover attempt |
1 |
There are some limitations with regard to maximal lengths of some values. See Limitations section for more information. |
The response in case of success is empty with HTTP code 200. In case of an error, HTTP 500 response is sent with header values Type
and Mess
set
containing the type of the error and its description respectively. See the Error handling section for more information.
INFO
This message doesn’t expect parameters, but you can supply an Accept
header specifying the context type of the response. If "text/xml"
is specified, the response will contain an XML. Otherwise plain text response is sent.
The plain text format has a following structure:
<Nodes>
<Hosts>
<Contexts>
where <Nodes> are 0 or more node records separated by a newline where each node record has following structure:
Node: [<number>],Name: <JVMRoute value>,Balancer: <Balancer name>,LBGroup: <LBGroup>,Host: <Host name>,Port: <port value>,Type: <scheme/protocol to use>,Flushpackets: <value>,Flushwait: <value>,Ping: <value>,Smax: <value>,Ttl: <value>,Elected: <value>,Read: <value>,Transfered: <value>,Connected: <value>,Load: <value>
For definitions of the individual values see the corresponding documentation section describing related directives. |
<Hosts> are 0 or more records separated by a newline where each host record has following structure:
Vhost: [<number>:<number>:<number>], Alias: <alias value>
and <Contexts> are 0 or more records separated by a newline where each record has following structure:
Context: [<number>:<number>:<number>], Context: <context value>, Status: <one of ENABLED, STOPPED, DISABLED>
The first field in each of the described records is intended for debugging purposes and are present only in the text representation. |
Example:
Node: [0],Name: spare,Balancer: mycluster,LBGroup: ,Host: localhost,Port: 8888,Type: ws,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: -1,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 0
Node: [1],Name: test,Balancer: mycluster,LBGroup: ,Host: localhost,Port: 8889,Type: ws,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: -1,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: -1
Vhost: [1:1:0], Alias: localhost
Context: [1:1:0], Context: test, Status: STOPPED
TODO: Desribe the XML format.
DUMP
This message doesn’t expect parameters, but you can supply an Accept
header specifying the context type of the response. If "text/xml"
is specified, the response will contain an XML. Otherwise plain text response is sent.
The plain text format has a following structure:
<Balancers>
<Nodes>
<Hosts>
<Contexts>
where the individual sections contain 0 or more records separated by a newline. The structure is similar to the corresponding the records of INFO response, however, there are a few differences such as missing commas in most of the cases.
The balancer records have the following structure:
balancer: [<number>] Name: <balancer name> Sticky: <value> [<Sticky session cookie]/[Sticky session path] remove: <value> force: <value> Timeout: <value> maxAttempts: <value>
The structure of node records is following:
node: [<number>:<number>],Balancer: <balancer name>,JVMRoute: <value>,LBGroup: [<value>],Host: <value>,Port: <value>,Type: <value>,flushpackets: <value>,flushwait: <value>,ping: <value>,smax: <value>,ttl: <value>,timeout: <value>
The host structure:
host: <number> [<host/alias value>] vhost: <number - host id> node: <number - node id>
and finally the context structure:
context: <number> [<context value>] vhost: <number - host id> node: <number - node id> status: <1 for ENABLED, 2 for DISABLED, 3 for STOPPED>
The first field in described records is intended for debugging purposes and are present only in the text representation. |
Example:
balancer: [0] Name: mycluster Sticky: 1 [JSESSIONID]/[jsessionid] remove: 0 force: 1 Timeout: 0 maxAttempts: 1
node: [0:0],Balancer: mycluster,JVMRoute: spare,LBGroup: [],Host: localhost,Port: 8888,Type: ws,flushpackets: 0,flushwait: 10,ping: 10,smax: -1,ttl: 60,timeout: 0
node: [1:1],Balancer: mycluster,JVMRoute: test,LBGroup: [],Host: localhost,Port: 8889,Type: ws,flushpackets: 0,flushwait: 10,ping: 10,smax: -1,ttl: 60,timeout: 0
host: 0 [localhost] vhost: 1 node: 1
context: 0 [test] vhost: 1 node: 1 status: 3
TODO: Describe the XML output.
STATUS
The STATUS command requires single JVMRoute
parameter specifying the node for which we want know the status. Optionally, you can supply Load
parameter with
a numerical value that will set the Load
factor for the target node.
In case of success, HTTP response with code 200 is sent with following parameters:
-
Type
with valueSTATUS-RSP
-
JVMRoute
corresponding to the value sent -
State
with valueOK
orNOK
-
id
with a numerical value that is the generation id of process in httpd if it changes (increases) when httpd has been restarted and its view of the cluster configuration could be incorrect. In this case ModClusterService should send a new CONFIG ASAP so the information could be updated.
In case of an error, HTTP 500 response is sent with headers Type
and Mess
set to the type and description of the error.
Example:
Type=STATUS-RSP&JVMRoute=spare&State=OK&id=698675605
PING
The PING
command does not require any parameter, but there are a few optional parameters you can use changing the command behavior. See
the table below.
Key | Description | Required |
---|---|---|
JVMRoute |
Is what is after the . in the JSESSIONID cookie or the parameter jsessionid |
No |
Host |
is the IP address (or hostname) where the node is going to receive requests from httpd |
Yes if Scheme or Port is specified |
Port |
is the port on which the node except to receive requests |
Yes if Host or Scheme is specified |
Scheme |
http/https/ajp The protocol to use between httpd and AS to process requests |
Yes if Host or Port is specified |
If no parameter is supplied, then the PING
checks whether the proxy if alive. In case JVMRoute
is specified, then the corresponding node
is checked. When Host
, Port
, and Scheme
are used, then it is checked whether httpd can reach a possible node using Scheme://Host:Port
.
In case all parameters are specified, only JVMRoute
is used and the behavior is the same as if the other ones were not present.
VERSION
This command requests the information about the used version and supported MCMP version. The HTTP 200 response has following format:
release: <software version>, protocol: <supported MCMP version>
so for example this is a valid response:
release: mod_cluster/1.3.20.Final, protocol: 0.2.1
ENABLE-APP
This command enables an application under the corresponding context and alias. If the application doesn’t exist, an existing virtual host is updated or a new one is created (that depends on the context/alias values).
The insert/update logic works as follows: the proxy server goes through all received aliases and if any of them matches an existing virtual host (first match), an update occurs. If there is no match, then a new virtual host is created.
Key | Description | Required |
---|---|---|
JVMRoute |
JVMRoute on which we enable the application |
Yes |
Context |
List of context under which the application should be deployed |
Yes if the request path is not |
Alias |
List of aliases for the corresponding virtual host |
Yes if the request path is not |
In case of success, an empty HTTP response with code 200 is sent. When an error occurs, HTTP 500 response is sent with Type
and Mess
headers containing the details.
DISABLE-APP
Same as ENABLE-APP only sets the app status to DISABLED.
STOP-APP
Same as ENABLE-APP only sets the app status to STOPPED.
REMOVE-APP
Same as ENABLE-APP but removes the app from the proxy.
3.1.3. Error handling
Once an error occurs during the MCMP communication, an HTTP response with code 500 is returned. The response contains headers
containing more details about the nature of the error. Namely Type
and Mess
header fields.
For example
HTTP/1.1 500 Internal Server Error
Date: Wed, 11 Sep 2024 13:45:44 GMT
Server: Apache/2.4.62 (Unix) mod_cluster/2.0.0.Alpha1-SNAPSHOT
Version: 0.2.1
Type: SYNTAX
Mess: Can't parse MCMP message. It might have contained illegal symbols or unknown elements.
Content-Length: 528
Connection: close
Content-Type: text/html; charset=iso-8859-1
<some html>
where
-
Version is the supported version of the ModCluster Management Protocol.
-
Type specifies type of the error (e.g.,
SYNTAX
when the message is not formed correctly orMEM
when the data cannot be updated in or read from the shared memory). -
Mess is the message describing the error in more detail
3.1.4. mod_cluster-manager handler
The mod_cluster-manager handler allows to do operation like ENABLE_APP/DISABLE_APP through a web interface. The format of the request string is the following:
Nonce:<nonce>&Cmd:<cmd>&Range:<range>&<MCMP String>
where:
-
<nonce> Is a string like e17066b4-0cb1-4e58-93e3-cdc9efb6be9 corresponding to a unique id of httpd.
-
<cmd> Is the command: one of ENABLE_APP, DISABLE_APP etc.
-
<range> Is a "NODE" or "CONTEXT". "NODE" means that the _APP command is a wildcard command.
-
<MCMP String> is a string containing a command described above.
Example:
http://localhost:8000/mod_cluster-manager?nonce=e17066b4-0cb1-4e58-93e3-cdc9efb6be9c&Cmd=DISABLE-APP&Range=CONTEXT&JVMRoute=jvm1&Alias=
3.1.5. Miscellaneous
(ModCluster Design suggests that ModClusterManager should wait until all sessions have been finished but that requires a to be written tool. The idea is that an administrator initiated step; similar to what people do now by changing workers.properties to quiesce a node in mod_jk, but it could be initiated from the JBoss side via a management tool). If a request arrives for a context corresponding to this node 500 will be returned to the client.
An additional utility could be written to send a REMOVE-APP once the JBoss node is stopped REMOTE-APP will remove all the node information from mod_cluster table and any socket between httpd and the node will be closed. (For a more complete description see ModCluster Internals.) If a request arrives for a context corresponding to this node 404 will be returned to the client: in fact the mod_proxy will not be called for the request and an httpd page could be displayed. A REMOVE-APP / for example will just clean the mod_cluster table corresponding to the application defined in the payload.
3.2. ModCluster Internals
This page describe the internal logic of mod_cluster on the httpd site. mod_cluster in fact is the just a sophisticated proxy balancer provider. It uses a provider for "shared" memory handling a post_read_request and one handler.
3.2.1. Structure of the httpd part
mod_cluster is made of 3 modules:
-
mod_sharedmem: Shared memory and persistence handler.
-
mod_proxy_cluster: A mod_proxy balancer that allows dynamic creation of balancers and workers.
-
mod_manager: A handler that process messages coming for the ModCLusterService and fill the shared memory according to the messages.
-
a patch to the actual 2.2.x mod_proxy code.
The modules use provider (via ap_lookup_provider/ap_register_provider) to interact. For first version only AJP will be supported.
3.2.2. Normal request processing (mod_proxy_cluster)
-
The translate_name hook will check that the url and vhost correspond to mappable location according the data from the shared memory.
-
If it is mappable r→handler will be set to "proxy-server" and the r→filename to proxy:cluster://URI. So that mod_proxy logic could process the request and give it our balancer.
-
-
The proxy_handler will process the request (because of the value of r→handler).
-
In our find_best_bytraffic the following is done:
-
Update the workers according to the table in shared memory (create or remove workers)
-
While not ok and can retry. (only on valid workers).
-
start proxying request.
-
ap_proxy_pre_request will call our balancer that will return the worker to use.
-
proxy_run_scheme_handler will forward the request to our balancer logic (via the canon_handler hook). That will find the cluster node and use again the scheme_handler to get it sending the request and reading the response
-
-
Done
-
When a request is received mod_cluster will try to map it to a host of the virtual hosts table is that fails the request is DECLINED so the rest of httpd may process it. If the request corresponds to a virtual host then the URL of the request is used to see if it corresponds to an context in the Contexts table if that fails DECLINED is returned (or do we want 404?) otherwise the request is processed according to the status of the context in the table. If the status is ENABLED the request is marked to be forwarded to the cluster. If the status is DISABLE and the request contains a sessionid the request is forwarded to the cluster if it doesn’t contain a sessionid 500 is returned. If the status is STOPPED, 500 is returned. That is done in an handler to prevent useless processing of request that can’t be processed by the cluster.
When choosing a node (worker) in cluster_bytraffic the host and corresponding context information are checked to prevent sending a request to a node that can’t process it (wrong host or application not deployed etc).
3.2.3. Asynchronous requests processing (mod_manager)
The asynchronous requests are processed by the mod_manager part of mod-cluster. As the protocol uses "special" method names the translate_name hook will set r→handler to "mod-cluster" when detecting those method.
The asynchronous requests are used to fill the shared area information according to the cluster information. The STATUS messages are handled a special way they validate the information according to the result of an asynchronous ping/pong. Once the information is validated it is stored in the shared memory.
3.2.4. Tables in the shared memory
The CONFIG messages allow to fill the 3 tables of shared memory:
-
nodes
That is a part of the CONFIG messages. An id is added to this description to identify the corresponding virtual hosts.
JvmRoute: <JvmRoute> Domain: <Domain> <Host: <Node IP> Port: <Connector Port> Type: <Type of the connector> (<node conf>) nodeid balancer name
-
virtual hosts
That is a part of the CONFIG messages. A node id and vhost id are added to this description to identify the context and the node. For each virtual host of the Alias: an entry is created in the virtual hosts table.
host nodeid vhostid
-
Contexts
That is a part of the CONFIG messages. A host id and status are added to this description. The status can have the following values: DISABLED/ENABLED/STOPPED. The contexts are created in the state STOPPED. For each context of the Context: an entry is create in the Contexts table.
context vhostid status
-
Balancers
That is a part of the CONFIG or ManagerBalancerName directives.
balancer balancerid (<balancer conf>)
3.2.5. Directives of mod_cluster
MemManagerFile filename. Base name to access to shared memory.
ManagerBalancerName name NoMapping. Name of the balancer to use with ProxyPass directive. Without NoMapping the balancer will be created automatically and mod_cluster will automatically maps the context deployed in the nodes.
Maxcontext number. Max number of contexts mod_cluster can handle.
Maxnode number. Max number of nodes mod_cluster can handle.
Maxhost number. Max number of aliases (virtual hosts) mod_cluster can handle.
Maxbalancer. Max number of balancer mod_cluster can handle.
Examples
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule sharedmem_module modules/mod_sharedmem.so
LoadModule proxy_cluster_module modules/mod_proxy_cluster.so
LoadModule manager_module modules/mod_manager.so
ProxyPass /myapp balancer://mycluster/myapp lbmethod=cluster_bytraffic
ManagerBalancerName mycluster NoMapping
The nodes information of the balancer mycluster is filled by the mod_cluster logic corresponding to mycluster.
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule sharedmem_module modules/mod_sharedmem.so
LoadModule proxy_cluster_module modules/mod_proxy_cluster.so
LoadModule manager_module modules/mod_manager.so
The balancer is created with default values and the mapping to the contexts is done dynamically according to the CONFIG messages.
3.2.6. Hooks of mod_sharedmem
-
post_config: Register a cleanup for the pools and memory.
-
pre_config: Create a global pool for shared memory handling.
-
provides a slotmem_storage_method (do (call a callback routine on each existing slot), create, attach and mem (returns a pointer to a slot)).
3.2.7. Hooks for mod_proxy_cluster
-
post_config: Find the providers that handle access to balancers, hosts, contexts and node (from mod_manager).
-
child_init: Create the maintenance task. The maintenance task checks regularly that the child balancers and workers corresponds to the shared memory information and create/delete/recreate new balancers or workers if needed. Additionally from time to time it checks the connection to node and force the cleaning of elapsed TTL connections (by cleaning the one it has used to test the node
-
translate_name: Check if the request corresponds to a URL mod_cluster could handle if yes sets the r→handler to "proxy-server" and r→filename to "proxy:cluster://balancer_name so that our pre_request hook will process it.
-
pre_request (proxy_hook_pre_request): It finds the worker to use and rewrite the URL to give the request to the corresponding scheme handler. This hook is called by proxy_handler() of mod_proxy.
-
canon_handler: It process the canonicalising of the URL.
-
provides a proxy_cluster_isup() to check that a node is reachable. (Using a "asynchronous" ping/pong for example).
3.2.8. Hooks of mod_manager
-
post_config: Create shared memory for balancers, hosts, contexts and nodes (using mod_sharedmem).
-
translate_name: Check the method of the request and if it is one defined by the protocol it sets r→handler to "mod-cluster".
-
handler: It process the commands received and update the shared memory.
-
provides a storage_method for balancers, hosts, contexts and nodes. A storage_method contains a read(), ids_used(), get_max_size() and remove().
The read accepts 2 parameters a id or key value and does 2 things reads a record using the slot number in the shared memory or the key corresponding to the second parameters.
3.2.9. Processing REMOVE-APP
Remove REMOVE-APP requires a "special" handling in mod_manager the context and host corresponding to the node will be removed from the shared memory and the node will marked as "removed". The logic to remove the node from the shared memory will be the following:
-
remove = mark removed (can’t be updated in future operations).
-
Any CONFIG corresponding to this node will now insert a new one = create a new id (use it to create the worker).
-
the maintenance threads will remove the information of the "mark removed" workers.
-
the maintenance threads will create the new worker.
-
After a while the "mark removed" entry will be removed by the one of maintenance threads only at that point the information of the node is removed from the shared memory.
3.3. ModCluster node balancer
The page describes the node and balancer part of the CONFIG message.
3.3.1. Node
Key |
Description |
Default |
Max size |
JvmRoute |
See CONFIG in ModCluster Management Protocol |
||
Domain |
See CONFIG in ModCluster Management Protocol |
||
Port |
See CONFIG in ModCluster Management Protocol |
||
Type |
See CONFIG in ModCluster Management Protocol |
||
flushpackets |
Tell how to flush the packets. On: Send immediately, Auto wait for flushwait time before sending, Off don’t flush. |
Off |
|
flushwait |
Time in milliseconds to wait before flushing. |
10 |
|
ping |
Time in seconds to wait for a pong answer to a ping. 0 means we don’t try to ping before sending. |
10 |
|
smax |
soft max inactive connection over that limit after ttl are closed. |
MPM configuration |
|
ttl |
max time in seconds to life for connection above smax. |
60 |
|
Timeout |
Max time in seconds httpd will wait for the backend connection. |
0 |
When a field is not present in the CONFIG message the default value is used in mod_cluster.
3.3.2. BALANCER
Key |
Description |
Default |
Max size |
Balancer |
See CONFIG in ModCluster Management Protocol |
||
StickySession |
Yes: use JVMRoute to stick a request to a node; No: ignore JVMRoute |
Yes |
3 |
StickySessionCookie |
Name of the cookie containing the "sessionid" |
JSESSIONID |
30 |
StickySessionPath |
Name of the parameter containing the "sessionid" |
jsessionid |
30 |
StickySessionRemove |
Yes: remove the sessionid (cookie or parameter) when the request can’t be routed to the right node; No: send it anyway |
No |
3 |
StickySessionForce |
Yes: Return an error if the request can’t be routed according to JVMRoute; No: Route it to another node |
Yes |
3 |
WaitWorker |
time in seconds to wait for an available worker. ("0" means no wait) |
0 |
|
Maxattempts |
number of attempts to send the request to the backend server. |
1 |
When a field is not present in the CONFIG message the default value is used in mod_cluster.
Notes on node in httpd internals
The Notes describe how the httpd internal tables are filled with the above node description.
reslist parameters
Parameter |
Description |
Default |
hmax |
size of the connection pool (hard max). Use ap_mpm_query(AP_MPMQ_MAX_THREADS) |
|
smax |
soft max inactive connection over that limit after ttl are closed |
hmax |
min |
min number of connections to have available. Use 0 |
|
acquire |
time in milliseconds to wait for an available connection. Use 0 no wait return error |
|
ttl |
max time in seconds to life for connection above smax |
60 |
Those are parameters to the apr_reslist_create() that handles the pool of connections.
keepalive behavior
We force keepalive on.
Others
Other field of the proxy_worker structure will be filled with the default values from mod_proxy.
proxy_worker_stat parameters
Parameter |
Updated by mod_proxy_cluster |
Updated by mod_manager |
status |
yes |
yes |
error_time |
yes |
no |
retries |
yes |
no |
lbstatus |
no |
no |
lbfactor |
no |
yes |
transferred |
yes |
no |
read |
yes |
no |
elected |
yes |
no |
route |
no |
no |
redirect |
no |
yes |
busy |
no |
no |
lbset |
no |
no |
status is filled by STATUS commands.
route is JVMRoute it is filled by CONFIG commands.
This information is in shared memory and the proxy_worker_stat uses a part of this shared memory:
/* proxy_worker_stat structure: */
int status;
apr_time_t error_time; /* time of the last error */
int retries; /* number of retries on this worker */
int lbstatus; /* Current lbstatus */
int lbfactor; /* dynamic lbfactor */
apr_off_t transferred;/* Number of bytes transferred to remote */
apr_off_t read; /* Number of bytes read from remote */
apr_size_t elected; /* Number of times the worker was elected */
char route[PROXY_WORKER_MAX_ROUTE_SIZ+1];
char redirect[PROXY_WORKER_MAX_ROUTE_SIZ+1];
void *context; /* general purpose storage */
apr_size_t busy; /* busyness factor *
int lbset; /* load balancer cluster set */
3.3.3. Notes on balancer in httpd internals
The Notes describe how the httpd internal tables are filled with the above balancer description.
The information from the CONFIG message is packed in the shared memory:
StickySessionCookie and StickySessionPath are stored in sticky and separed by a '|'
StickySession, StickySessionRemove and StickySessionForce are stored in sticky_force
The StickySessionForce forces only to the domain (to node belonging to the same domain) when the node corresponding to the sessionid belongs to a domain.
max_attempts_set: is set if Maxattemps is in the CONFIG message and its value different from 1.
sticky sticky_force timeout max_attempts max_attempts_set
That is what is needed to create the balancer to be able to use it.
/* proxy_balancer structure extract: */
const char *sticky; /* sticky session identifier */
int sticky_force; /* Disable failover for sticky sessions */
apr_interval_time_t timeout; /* Timeout for waiting on free connection */
int max_attempts; /* Number of attempts before failing */
char max_attempts_set;
3.4. ModCluster INFO-RSP
3.4.1. INFO-RSP
INFO-RSP is the response to a INFO command. Response would include information on what virtual hosts are configured (so per-webapp commands can specify the correct virtual host) and other info that ModClusterService can make available to management tools (e.g. what addresses/ports httpd is listening on.)
The element are prefixed with their key id. The id of Context and Vhost are node, virtualhost and the record number.
For example:
-
Balancer[1]… The balancer corresponding to the nodes,
-
Node: [1:1]… The first node in the node table.
-
Vhost: [1:1:1]… An Alias of the Virtual Host belonging to the first node and first Virtual Host.
-
Context: [1:1:2]… A Context deployed/mapped in the first node and first Virtual Host. That the 2 Context in the Context table.
The INFO-RSP can be display using the mod_cluster_manager handler note that in this case the time values are in seconds except flushwait with is in milliseconds.
Server (Not the actual version)
Server is corresponding to the server record in httpd that is a httpd VirtualHost description. The following elements are dumped:
-
Name: (String) Name of the server.
-
Root: (String) Root directory.
-
ThreadsPerChild: (Number) Number of thread per child process.
-
MaxRequestsPerChild: (Number) Max requests a child will handle before stopping.
-
MaxClient: (Number) Max simultaneous requests supported.
-
LimitReqFieldsize: (Number) max size of any request header field.
-
LimitReqFields: (Number) max on number of request header fields.
-
Scheme: (String) The server request scheme (http/https).
-
KeepAlive: (Boolean) Server keeps connections opened between request.
-
KeepAliveTimeout (Number) Amount of time the connection between httpd and the browser is kept opened waiting for request.
-
MaxKeepAliveRequests (Number) Max number of request for a browser httpd will process before closing the connection.
-
TimeOut (Number) Max time in seconds httpd will wait for a backend server.
Balancer
Balancer is corresponding to a balancer in mod_proxy logic. The following elements are dumped:
-
Name: (String) Name of the balancer.
-
StickySession: (Boolean) Use JVMRoute to stick a session to a node.
-
StickySessionCookie: (String) Name of the cookie containing the sessionid.
-
StickySessionPath: (String) Name of the parameter containing the session when it is URL encoded.
-
StickySessionRemove: (Boolean) Remove sessionid if the node doesn’t correspond to the JVMRoute.
-
StickySessionForce: (Boolean) Return error if no node correspond to the JVMRoute.
-
Timeout: (Number) Time to wait for an available worker. (seconds in mod_cluster_manager)
-
Maxattempts: (Number) number of attempts to send the request to the backend server.
Example for mod_cluster_manager:
balancer: [1] Name: mycluster Sticky: 1 [JSESSIONID]/[jsessionid] remove: 0 force: 0 Timeout: 0 Maxtry: 1
Node
None is corresponding to a node of a cluster in AS (to a <Connector/> in the server.xml). The following elements are dumped:
-
Name: (String) That is the JvmRoute.
-
Balancer: (String) The name of balancer that processes the loadbalancing for the JvmRoute.
-
Domain: (String) Use to group the node in buddy replicated nodes group.
-
Host: (String) Hostname where the cluster node runs.
-
Port: (Number) Port on which the connector is waiting for requests.
-
Type: (String) Protocol using by the connector (AJP/http/https).
-
Flushpackets: (String): Tell when buffer between httpd and browser should be flushed.
-
Flushwait: (Number): Time to wait before flushing (When flushpackets is 'Auto"). (milliseconds in mod_cluster_manager)
-
Ping: (Number): Time to wait after a ping to receive a pong from a node. (seconds in mod_cluster_manager)
-
Smax: (Number): Max connections to keep opened between httpd and the back-end server.
-
Ttl: (Number): Max time a connection could exist when there is more than smax connections opened. (seconds in mod_cluster_manager)
-
Elected: (Number): Number of time the worker was chosen by the balancer logic.
-
Read: (Number): Number of bytes read from the back-end.
-
Transferred: (Number): Number of bytes send to the back-end.
-
Connected: (Number): Number of opened connections.
-
Load: (Number) Load factor received via the STATUS messages.
Example from mod_cluster_manager:
node: [1:1],Balancer: mycluster,JVMRoute: neo3,Domain: [neuchdom],Host: 10.33.144.5,Port: 8009,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 11,ttl: 60,timeout: 0
Vhost
Vhost is corresponding to the hosts and Aliases defined in the server.xml.
-
Alias: (String) Corresponding alias.
Context
Context is corresponding to the context deployed in the node.
-
Context: (String) URL to be mapped.
-
Status: (String) Status of the application: ENABLED, DISABLED or STOPPED.
3.5. ModCluster AS Integration
3.5.1. ModCluster Integration in JBoss AS6
Overall container JIRA for this: MODCLUSTER-6. A number of child issues are defined below; several of these child issues also have subtasks.
Elements
-
Configuration
-
Things to configure
-
Node
-
Balancer
-
HASingleton election
-
load balance policy (what metrics drive the decision)
-
-
ClusterListener handles the first 2 of these; is that how we want it?
-
Doing the rest requires MC-based config
-
-
Lifecycle/Deployment events (MODCLUSTER-3)
-
ClusterListener handles this well
-
-
Periodic events
-
ClusterListener uses JBossWeb background thread from Engine
-
-
Communication with httpd side
-
ClusterListener has good code for this; would be very nice to re-use (MODCLUSTER-8)
-
Extract from the ClusterListener class into a separate class?
-
Use an interface so the HASingleton-based version can add behavior
-
-
-
mod_cluster advertisement
-
JBossWeb has code for this
-
-
HASingleton (MODCLUSTER-4)
-
HASingletonSupport subclass
-
Primary task is polling nodes for load information, aggregating, feeding to mod_cluster
-
Also detects intra-cluster comm failures, alternate mechanism to notify mod_cluster
-
Gateway to mod_cluster for all other messages (lifecycle/deployment)?
-
When not functioning as singleton master, it acts as RPC handler for requests from master
-
-
HASingleton master election policy (MODCLUSTER-5)
-
Allow a master per domain
-
Node that can’t reach mod_cluster shouldn’t be master
-
Handle the above by putting a small data object in DRM instead of simple string
-
-
Load Manager (MODCLUSTER-9)
-
Gathers information from various sources of metrics
-
Aggregates into an overall metric
-
Includes a time decay function
-
-
Load metrics (MODCLUSTER-12)
-
Interface; impl’s plug into the load manager
-
Each impl handles a metric or set of related metrics (e.g. web requests, web sessions, VM/machine metrics, transactions)
-
Need to figure how to integrate into various subsystems that provide data
-
Questions
-
Individual nodes communicate lifecycle/deployments events directly to mod_cluster, or via HASingleton?
-
We want all nodes to be able confirm their ability to communicate with mod_cluster, so even with HASingleton nodes need to be able to contact mod_cluster themselves
-
Answer: A STATUS message sent by the HASingleton can trigger a ping/pong check of a different node, so there is no need for each node to communicate with mod_cluster. So, route all messages through the singleton.
-
High Level Design
Explanation:
-
The four XXXConfiguration interfaces basically package the getters of all the configuration properties exposed by the JBoss Web ClusterListener class. NodeConfiguration is the info about a particular node that gets passed to mod_cluster; BalancerConfiguration is non-node specific stuff like sticky session configuration that gets passed to mod_cluster. SSLConfiguration configures any JSSESocketFactory; subinterface MCMPHandlerConfiguration includes the other properties that govern how to discover (via static config or an AdvertiseListener and communicate with mod_cluster. I’ve divided these into different interfaces so different clients (e.g. JSSESocketFactory only see the necessary properties. Use interfaces so different classes can implement (e.g. ModClusterService here but possibly the standalone ClusterListener). See ModCluster Listener for a description of the relevant configuration properties.
-
ModClusterService forms the core. Its function is pretty simple:
-
Expose the 4 XXXConfiguration interfaces; expose setters to allow centralized configuration via a -beans.xml file.
-
Create (or have dependency injected) the other components; wire them together as needed.
-
Expose the main management interface (JMX, embedded console)
-
-
BasicClusterListener implements the Tomcat LifecycleListener and ContainerListener interfaces. The core event listener code from the JBoss Web ClusterListener class. When it receives events it calls into
-
JBossWebEventHandler exposes the various operations that are implemented as protected methods in the JBoss Web ClusterListener class. I make this an interface to support different implementations in a clustered, non-clustered or standalone environment.
-
ClusterCoordinator is the HASingleton. It implements JBossWebEventHandler so it receives events from BasicClusterListener. When it receives events it either makes RPC calls to whichever of its peers is the coordinator, or if its the coordinator it drives the process of gathering load data from around the cluster and sending status messages to mod_cluster. Not shown on the diagram is the expectation that ClusterCoordinator is a subclass of the HASingletonSupport class and has injected into it an HASingletonElectionPolicy impl and an HAPartition.
-
Each node’s ClusterCoordinator gets the current load balance factor from a LoadBalanceFactorProvider. The implementation of that interface is a whole subsystem.
-
If a ClusterCoordinator is the HASingleton master, it needs to communicate with the http side. It does this through an implementation of the MCMPHandler interface.
-
DefaultMCMPHandler is the standard implementation of MCMPHandler. It basically encapsulates the proxy management and request sending code in the JBoss Web ClusterListener. If so configured, it creates an AdvertiseListener to listen for multicast service advertisements by mod_cluster instances.
-
MCMPRequest (not shown except as a param or return type) is a simple data object that encapsulates an enum identifying the request type (CONFIG, ENABLE-APP, STATUS, etc), the wildcard boolean, and the Map<String, String> of parameters.
-
ResetRequestSource is a bit of an oddity that came with factoring the proxy management code in DefaultMCMPHandler out of ClusterListener. The proxy mgmt code during periodic status checks checks to see if any of its proxies to mod_cluster were in error state; if so it would tell the listener (via
reset(int pos)
to send a set of messages to that mod_cluster instances to reestablish configuration state. That reset(int pos) call was problematic in decoupling the proxy mgmt code, as it exposes internal details of the proxy manager (the pos). So, instead I created an interface ResetRequestSource an instance of which is injected into an MCMPHandler. When the MCMPHandler discovers it needs to reset a proxy, it asks the ResetRequestSource to provide a List<MCMPRequest> of commands that need to be invoked on mod_cluster to reset the node’s state. Any class with access to the Tomcat Server object and to the NodeConfiguration and BalancerConfiguration could play this role.
Other Uses of the Same Abstractions?
The following shows how many of the same abstractions can be used in standalone JBoss Web. Not surprising since they mostly came from JBoss Web’s ClusterListener. This diagram basically represents a possible factoring of the current ClusterListener into separate classes to allow code reuse in the AS. Ignore the package names below; they can be changed:
-
The XXXConfiguration interfaces are same as described above. Here they are implemented by ClusterListener instead of ModClusterService.
-
ClusterListener plays the role played by ModClusterService in the AS, since there is no -beans.xml.
-
Expose the 4 XXXConfiguration interfaces; expose setters to allow centralized configuration, here via the Listener element in server.xml.
-
Create the other components; wire them together as needed.
-
Expose any management interface (JMX)
-
-
ClusterListener is actually a subclass of BasicClusterListener (discussed above), which contains the actual event listener implementation.
-
ClusterListener provides to its BasicClusterListener superclass the JBossWebEventHandler impl to use. Here it is DefaultJBossWebEventHandler which basically just encapsulates the relevant methods that are in the current ClusterListener impl.
-
MCMPHandler interface and DefaultMCMPHandler impl are the same as is used in the AS cluster discussion above.
-
Here the ClusterListener acts as the ResetRequestSource.
3.6. Using a hardware load balancer in combination with mod_cluster server-side load metrics
When using a hardware loadbalancer you can leverage from the ability of mod_cluster to calculate server-side load metrics to determine how best to balance requests.
By default, these metrics are not accessible from outside of mod_cluster, but a minor code change to the distribution allows you to access this so called “loadbalance factor” through JMX. A healthcheck servlet could access this information and provide this to the hardware load balancer.
Follow the steps below to modify your mod_cluster distribution to enable additional JMX information:
# Download the mod_cluster source (version 1.0.10-GA was used in combination with JBoss EAP 5.1.x) {code}
$ svn co http://anonsvn.jboss.org/repos/mod_cluster/tags/1.0.10.GA
# Update the pom.xml to include additional repository for trove (see: this post https://community.jboss.org/message/625343)
<repository>
<id>maven-nuxeo</id>
<name>Maven Nuxeo Repository</name>
<url>https://maven.nuxeo.org/nexus/content/groups/public/</url>
<layout>default</layout>
<releases>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
</snapshots>
</repository>
# Verify the non-patched mod_cluster build process; skip the tests as you might run into issues with your local firewall {code}
$ mvn -P dist package -Dmaven.test.skip=true
# You will find the mod_cluster SAR under the target/ directory: {code}
mod-cluster.sar/
├── META-INF
│ └── mod-cluster-jboss-beans.xml
└── mod-cluster-1.0.10.GA.jar
1 directory, 2 files
# Edit src/main/java/org/jboss/modcluster/load/impl/DynamicLoadBalanceFactorProviderMBean.java and add to the MBean interface: {code}
/**
* Returns the loadbalance factor
* @return a positive integer
*/
int getLoadBalanceFactor();
# Run the maven task again {code}
$ mvn -P dist package -Dmaven.test.skip=true
Last, but not least, you can get the loadbalancer factor through JMX when browsing to: jboss.web:LoadbalancerProvider.
3.7. Encrypting connection between httpd and TC
The only way to encrypt the data between Apache httpd and Tomcat is to use mod_proxy with https.
In httpd.conf
you need something like:
SSLProxyEngine On
SSLProxyVerify require
SSLProxyCACertificateFile conf/cacert.pem
SSLProxyMachineCertificateFile conf/proxy.pem
ProxyPass / https://127.0.0.1:8443/
ProxyPassReverse / https://127.0.0.1:8443/
conf/proxy.pem should contain both key and certificate (and the certificate must be trusted by Tomcat).
|
conf/cacert.pem must contain the CA that signed the Tomcat certificate.
|
3.8. Forwarding SSL environment when using http/https proxy:
The variables supported by the servlet interface are the following:
-
javax.servlet.request.X509Certificate
-
javax.servlet.request.cipher_suite
-
javax.servlet.request.ssl_session
-
javax.servlet.request.key_size
To get the client certificate or any SSL information from the browser you have
to use mod_header to add the SSL information to header. To do that add in
httpd.conf
of Apache httpd the following:
RequestHeader set SSL_CLIENT_CERT "%s"
RequestHeader set SSL_CIPHER "%s"
RequestHeader set SSL_SESSION_ID "%s"
RequestHeader set SSL_CIPHER_USEKEYSIZE "%s"
Then you need a valve in Tomcat to extract the information from the request Headers. See the original code. The valve has been integrated in Tomcat and in JBossWeb (since 2007).
Once you have built the valves.jar
copy it in server/lib/
and edit
server.xml
to add <Valve className="SSLValve"/>
in the <Engine/>
part
of the file.
4. JBossWeb listener
4.1. ClusterListener
The ClusterListener allows a standalone JBoss Web to work with a mod_cluster proxy. It also works in JBoss AS.
To enable it just add in server.xml:
The following parameters are supported:
-
proxyList: list of proxy frond-end that will use us as backend that is a coma separed list of host:port example:
-
ProxyList="localhost:6666,neo:6666"localhost:7666"
-
-
proxyURL: URL prefix to send with the commands default is no prefix
-
proxyURL="/bla"
-
-
socketTimeout: Socket timeout for connections to the httpd servers default 20000ms
-
socketTimeout="20000"
-
-
domain: domain
-
domain="domain1"
-
-
flushPackets: How mod_proxy should flush the packets.
-
flushPackets="on"
-
-
flushWait: Time in millisecond before flushing a packet
-
flushWait="500"
-
-
ping: Max time to wait for a cpong after a cping (Asynchronous and Synchronous ping/pong)
-
ping="10"
-
-
smax: Max number of connection a process will handle in one worker
-
smax="100"
-
-
ttl: Max time an unused connection is allowed to live in mod_cluster.
-
ttl="600"
-
-
nodeTimeout: Max time mod_cluster is going to wait for a node to answer. Value in seconds.
-
nodeTimeout="10"
-
-
balancer: Name of the balancer.
-
balancer="cluster1"
-
-
stickySession: Use sticky session.
-
stickySession="true"
-
-
stickySessionRemove: Remove the session id is the JVMRoute can’t be used to route the request.
-
stickySessionRemove="true"
-
-
stickySessionForce: Return an error is the worker corresponding to the JVMRoute can’t be used.
-
stickySessionForce="true"
-
-
workerTimeout: Max time to wait for a free worker. Note that is a kind of poll try to find the best worker. Value in seconds.
-
workerTimeout="1"
-
-
maxAttempts: Number of retry before giving up (and returning an error)
-
maxAttempts="3"
-
4.1.1. Using mod_advertise
mod_advertise is a small httpd module that allows to discover the httpd front-end instead defining them in proxyList. 3 parameters are supported:
-
Advertise: Default is true if proxyList is not filled.
-
AdvertiseGroupAddress: Address of the multicast to join. Must be the same value as the mod_advertise directive AdvertiseGroup.
-
AdvertiseGroupAddress="232.0.0.2"
-
-
AdvertisePort: Port of the multicast to join. Must be the same value as the mod_advertise directive (mod_advertise default 23364).
-
AdvertisePort="23364"
-
-
AdvertiseSecurityKey: Key the front-end is going to send. Default no key no check.
4.1.2. Using SSL
-
ssl: Use SSL to connect to httpd, default false
-
ssl="true"
-
-
sslCiphers: Ciphers to be used for the SSL connection
-
sslCiphers="cipher1,cipher2"
-
-
sslProtocol: SSL protocol to be used for connection, default "TLS"
-
sslProtocol="name"
-
-
sslCertificateEncodingAlgorithm: Encoding algorithm used for certificates
-
sslCertificateEncodingAlgorithm="alg"
-
-
sslKeyStore: Certificate store, default to "~/.keystore"
-
sslKeyStore="myCertificate"
-
-
sslKeyStorePass: Password for the certificate, default "changeit"
-
sslKeyStorePass="changeit"
-
-
sslKeyStoreType: Certificate store type, default "JKS"
-
sslKeyStoreType="type"
-
-
sslKeyStoreProvider: Certificate store provider
-
sslKeyStoreProvider="provider"
-
-
sslKeyAlias: Alias name for the key
-
sslKeyAlias="alias"
-
-
sslTrustAlgorithm: Encoding algorithm used for the trust certificates
-
sslTrustAlgorithm="alg"
-
-
sslCrlFile: Certificate revocation list
-
sslCrlFile="file"
-
-
sslTrustMaxCertLength: Maximum certificate chain length, default 5
-
sslTrustMaxCertLength="6"
-
-
sslTrustStore: Trust certificate store
-
sslTrustStore="myTrustStore"
-
-
sslTrustStorePassword: Password for the trust store, default is to use the main certificate password
-
sslTrustStorePassword="pass"
-
-
sslTrustStoreType: Certificate store type
-
sslTrustStoreType="type"
-
-
sslTrustStoreProvider: Certificate store provider
-
sslTrustStoreProvider="provider"
-
5. Future directions
5.1. Consistent-hashing based JBC data partitioning w/ mod_cluster
5.1.1. Overview
Documentation of design discussion at our October 2008 Brno clustering dev meetings.
JBoss Cache is considering moving to a data partitioning approach based on consistent hashing. Idea is a particular key (i.e. Fqn) under which cached data is stored would be hashed using an algorithm that incorporates the cluster view. Which nodes in the cluster the data is stored upon would be derived from the hash function. This approach is a logical alternative to buddy replication; both seek to improve memory and network utilization by not requiring data to be redundantly stored on every node in the cluster.
Data partitioning differs from buddy replication in not requiring data ownership, i.e. that access to the data should only occur from one node. However, the more access to the data can be confined to a node where that data is locally cached, the more performant the overall system would be.
Storing session data in a consistent-hash based data partition can theoretically be highly performant, because sessions are meant to be sticky – i.e., accessed only via one node. The trick is determining how to ensure that the node to which web requests are sent is the one where the data is locally stored. What I outline below is how an enhanced mod_cluster could help ensure this occurs.
5.1.2. Assumption
The mod_cluster code on the httpd side has available to it the same consistent hashing algorithm as is used on the JBoss side, as well as the same inputs (i.e., the JBoss-side cluster topology). Based on this, mod_cluster can always determine the primary node for a session (session id being the value being hashed) as well as all backup nodes. With this information, mod_cluster can properly route requests both before and after failover.
5.1.3. Request Handling
This is the approach discussed in Brno:
-
Initial request comes to node 1. This is for a new session, so node 1 has the freedom to specify the session id. It generates an id that hashes to node 1. Effect is the session data will be cached locally.
-
Second request arrives, mod_cluster hashes the session id, determines the data is local on node 1, so routes the request there.
-
Node 1 crashes but another request comes in. mod_cluster checks the session hash, sees the secondary node for the session is node 3, so it routes the request to node 3, which handles it using its locally cached backup data.
-
The AS side recognizes the failure of node 1, so the topology information that is an input to the hashing function changes. Following such changes, some portion of cached data (no more than 1/n, where n is the number of cluster members) will need to be moved to a new host. In this case, the session in question needs to be moved and ends up on node 2.
-
Another request comes in, but mod_cluster is not yet aware of the new topology having changed the hashing function input, so it routes the request to node 3. Node 3 handles the request by making remote calls to node 2 to read/write the session data.
-
The AS side informs mod_cluster of the change to the topology.
-
Next request comes in, mod_cluster uses the new topology in its hashing function and routes the request to node 2, where the data resides.
The above is a deliberately complex scenario where the session data is moving around the AS-side cluster and requests are coming in while the topology info is inconsistent between the mod_cluster side and the AS side.
5.1.4. Possible Simpler Approach
A simpler approach is to continue to use the jvmRoute suffix in the session id to control routing on the httpd side, but allow JBoss AS nodes to alter an emitted session cookie to point to a node other than themselves.
Example:
-
Initial request comes in to node 1 which, as above, generates a session id that points to itself. Session cookie is 123xxx.node1.
-
Node 1 fails, so a subsequent request is randomly sent by mod_cluster to node 2. The 123xxx session data is not stored locally on node 2, but rather on node 3. Node 2 handles the request by making remote calls to node 3 to read/write the session data.
-
Node 2 recognizes that making remote calls to handle requests is inefficient, so in its response to the failover request, it alters the session cookie, not to 123xxx.node2 but to 123xxx.node3.
-
Next request that comes in is routed by mod_cluster to node 3, where the data is local.
This approach eliminates the need for mod_cluster to understand the consistent hashing algorithm.
6. FAQ
6.1. Questions and Answers on mod_cluster webinar
Want to know more, explained by the developer see: http://www.vimeo.com/13180921 and read the resulting QA:
Q: Is the demo application available?
A: Yes, it’s part of the mod-cluster download (under demo/client). The
SessionDemo itself is not available, but it’s a simple demo
adding data to an HTTP session. I can make it available if
necessary…
Q: Are the slides available?
A: www.jboss.org/webinars
Q: Is this a direct competition to Terracotta’s offering?
A: No; mod-cluster is about (1) dynamic discovery of workers, (2) web
applications, and (3) intelligent load balancing. Clustering is an
orthogonal aspect; as a matter of fact, clustering could be used
among a number of workers which are not clustered.
Q: Is the clustering between jboss instances within a domain done @ JVM level?
A: No; we use JGroups (www.jgroups.org) and JBossCache
(jboss.org/jbosscache) to replicate sessions. In JBoss 6, we’ve
replaced JBossCache with Infinispan (infinispan.org) to replicate
and/or distribute sessions among a cluster.
Q: Why should the deployment topology use httpd? Can’t the tomcat (bundled in
JBoss) use APR.
A: Yes, JBossWeb can use APR, and as a matter of fact does use it if
the shared APR lib is found on the library path. However, using APR
and httpd are orthogonal issues; while the mod-cluster module could
theoretically be used in JBossWeb directly, we haven’t tried it
out, as many deployments still use httpd in production.
Note that JBossWen cannot be used as a reverse proxy.
Q: What are the steps involved to migrate a setup which is on mod_jk to
mod_cluster?
A: There are only a few steps involved (more details can be found on
jboss.org/mod_cluster):
- Use the httpd modules downloadable from jboss.org/mod_cluster
- Configure httpd.conf accordingly
- Drop workers.properties and uriworkermap.properties
- Configure JBoss AS to include the addresses of the httpd
daemon(s) running
- (Optional) Configure the domain for the JBoss AS instance
The steps are described in detail in
http://docs.jboss.org/mod_cluster/1.1.0/html/mod_jk.html
Q: Are there seperate logging mechanism for mod_cluster like we use to have
for mod_jk
A: No; mod-cluster uses the normal httpd log, and this is configured in
httpd.conf (similar
to mod-jk / mod-proxy). On the JBoss AS side, the normal AS logging
is used (e.g. conf/log4j.xml)
Q: Is the mod_cluster the same as mod_proxy_balancer?
A: No; mod_proxy_balancer requires manual configuration
(e.g. hosts to be balanced over). Also, web applications have to be
present on all hosts, and don’t register themselves
automatically. Plus, mod_proxy_balancer doesn’t have any notion of
load balance factors sent to it by the workers.
Q: I have an application that uses an HASingleton(ejbtimer). In case of a
multidomains architecture, my application would fail because I would have an
ejbtimer in each domain. How would you get a large cluster work in this
scenario.
A: If one singleton timer per domain is not desired, then one could
place the singleton timer into a separate cluster, which spans
multiple domains. Note that an HASingleton ejb timer and
distributed cache will use separate channels by default.
Q: Is it not efficient to avoid sticky-sessions? If we avoided sticky sessions,
then We could use hardware based load-balancers which did load balancing @
Transport (TCP/IP) layer rather than Application layer.
A: Making sessions non-sticky means that access to sessions can be random, ie.
requests for an HTTP session can go to any node within a domain. However, this
means that we should not use asynchronous replication, as a write to an
attribute followed by an immediate read of the same attribute but on a
different node might lead to the reading of stale data. However, using
synchronous replication is slower because every write incurs a round trip to
the cluster, and the caller blocks until all responses have been received. Our
recommendation is to use sticky sessions and asynchronous replication, for the
best performance.
Q: Is it possible to configure mod_cluster or mod_jk in a way that certain IPs
requests go to just a particular domain
A: Not easily. One could configure virtual hosts in httpd.conf, and
workers connect to certain virtual hosts only, but there is no
enforcement of which domains are hit from the httpd side.
Q: We used appliance for load balancing. Can we use mod-cluster for dynamic
configuration instead of using static properties?
A: No, mod-cluster requires the httpd to run. We intend to talk to load
balancer vendors and get them to implement the MCMP protocol, so that their
balancers could be used with mod-cluster enabled workers.
Q: Is mod_cluster delivered as a native module in Apache just as mod_proxy?
A: Yes, on the httpd side. On the JBoss AS side, we use a service archive
(mod_cluster.sar), in /deploy
Q: A little more general clustering question. What about distributing jboss
servers across datacenters but that belong to the same cluster?
A: This is possible, however, in most cases IP multicasting would not be
available over a WAN. Therefore, the configuration of JBoss AS should use a
TCP based stack rather than a UDP based stack.
Q: Can you suggest the pattern to cluster the Apache server for Fail over when
acting as Load balancer for Jboss Cluster
A: This is very simple: just start multiple httpds and add them to JBoss AS,
e.g. mod_cluster.proxyList=host1:8000,Host2:8000 etc
Workers (JBoss instances) will then register themselves and their
applications with all httpds in the list.
Q: Is mod_cluster available with JBoss AS (community) or JBoss Enterprise
Application Platform from Red Hat?
A: Currently, mod-cluster 1.1.0.CR3 will ship as part of JBoss AS 6. The
mod-cluster functionality is part of EAP 5.0.1 and will also be part of JBoss EAP 5.1.
Q: Can the worker nodes be configured from JON?
A: Not yet (with respect to mod-cluster configuration). This is on the roadmap.
Q: What is the configuration for dynamically adding nodes as load increases?
A: This feature is not available. It might be available as part of our
Deltacloud product. Currently, third party vendor’s products, such as
RightScale, could be used to do this.
Q: Which version of mod_cluster do you use ? in my version i cannot see the
sessions
A: To see sessions in mod_cluster_manager, the following entry has to be added
to httpd.conf:
<IfModule mod_manager.c>
MaxsessionId 50
</IfModule>
Note that sessions are by default not shown in mod_cluster_manager. Refer to the documentation at jboss.org/mod_cluster for details.
Q: can you show config quickly how mod_cluster automatically detect new hosts?
A: When a new JBoss instance is started, as soon as the mod_cluster.sar service
is deployed, the host and all of its applications will be registered with all
httpds, so this happens immediately.
Q: What do you advice in a multi-datacenter setup? Can we use mod_cluster and
won’t this cause a event-storm when one of the datacenters goes down?
A: When you have domains across multiple data centers, and one data center goes
down, then the other data center has to accommodate the traffic from the data
center which is down. This causes more traffic to the surviving data center, so
when doing capacity planning this should be taken into account. If the nodes in
a domain run in the cloud, then one could envisage automatically starting new
virtualized instances to accommodate the handling of this increased traffic.
Q: Are there seperate logging mechanism for mod_cluster like we use to have for
mod_jk
A: mod-cluster is configured through the usual mechanism in httpd.conf
Q: Do we need a mod_cluster manager on all nodes [in the cluster]?
A: Note that mod_cluster_manager is only available on the httpd side
Q: Is gossiprouter high available?
A: Yes, multiple GossipRouters can be started. Note that, if running only on
EC2, then a protocol called S3_PING can be used as an alternative. It uses an
S3 bucket to store cluster topology information.
Q: For the group of HTTP daemons in front of the clusters, I assume those can
be round robin’d DNS, or any other method of load balancing them?
A: No, DNS round robin (or a hardware load balancer fronting the httpds) works.
When using sticky sessions, the jsessionid is sent with each request (cookie or
URL rewriting) and it is suffixed with the jvmRoute of the node which hosts a
given session.
Q: Does JBoss support UNICAST messaging?
A: Yes; JGroups would have to be configured appropriately to do that. When
using TCP, this is done automatically. When using UDP, ip_mcast="false" would
have to be set.
Q: Is there support for mount point exclusions like JkUnMount in mod-jk?
A: Yes, use
<property name="excludedContexts">jmx-console,web-admin,ROOT</property>
in
/deploy/mod_cluster.sar/META-INF/mod_cluster-jboss-beans.xml
Q: What are the steps involved to migrate a setup which is on mod_jk to
mod_cluster?
A: See the previous answer above
(http://docs.jboss.org/mod_cluster/1.1.0/html/mod_jk.html)
Q: There is implicit, a concept, of starting connections from the jboss
"backend" to the frontend" ,this seems odd to me?
A: This is only conceptual; workers will not create a socket connection to
httpd. Instead httpd connects to the workers (ie. JBoss AS instances) and
the workers use the same channel to send status updates, registration of web
applications etc.
Q: Can you use buddy list to replicate session accross domains?
A: Yes, that can be done, as a domain doesn’t need to have the same scope as a
cluster; a cluster can span multiple domains. However, for scalability
purposes, we recommend to restrict a cluster to a domain
Q: How does full replication in each domain compare to using buddy replication
and just one cluster/domain?
A: The scalability of full replication is a function of cluster size and
average data size, so if we have many nodes and/or large data sets, then we
hit a scalability ceiling.
If DATA_SIZE * NUMBER_OF_HOSTS is smaller than the memory available to each
host, the full replication is preferred, as reads are always local. If this
is not the case, then we can use multiple domains, or we can use one single
cluster, but switch from full replication to either buddy replication
(JBossCache) or distribution (Infinispan). Distribution only stores N copies
of a session, therefore scales much better than full replication.
Q: Is there any turorial provided?
A: There’s a quick start guide available at jboss.org/mod_cluster
Q: Is it possible to limit which hosts are allowed to join the cluster easily?
A: Yes. This can be done at the JGroups level, by using a protocol called AUTH
(http://community.jboss.org/wiki/JGroupsAUTH). It provides passwords, X.509
certificates, host lists and simple MD5 hashes as authentication, but it is
pluggable, so other mechanisms can be included. Post questions on AUTH to
the JGroups mailing list (jgroups.org).
Q: to uprade without downtime you have to have at least two domains for each
application, right?
A: Yes
Q: Is there any method/workaround to avail Session Replication across Domains?
A: A cluster isn’t restricted in scope to a domain, it can span multiple
domains. However, that defeats the purpose of a domain (divide-and-conquer),
and makes rolling upgrade more difficult. For instance, if a cluster spans 2
domains, then it is better to club the 2 domains together into one.
Q: I missed some of the demo - I saw the session replication/migration in the
demo, but wanted to know if I have 2 apache servers in front of the jboss
cluster and a network load balancer doing round robins will mod_cluster
maintain the session across them?
A: Yes. The jvmRoute is appended to the jsessionid and identifies the node in a
given domain uniquely. See also the question above on DNS round robin.
Q: on apache side, which is required versions? 2.2 or also 2.0?
A: 2.2.8 or higher
Q: Im using JBoss 4.2.2 GA.. Should I migrate to JBoss6?
A: JBoss 5 or higher. You can use mod_cluster with JBoss 4.2.2 - but
you’d need to configure it as you would for JBoss Web standalone
(or Tomcat) - and consequently has slightly limited functionality,
e.g. no HA-mode, limited to 1 load metric.
Q: UDP broadcast?
A: The ability to send a packet to all hosts on a given subnet. IP multicasting
is more efficient because a packet is only sent to subscribed hosts. IP
multicasting is more efficient than TCP is large clusters, because the
switch copies the packet to all recipients, whereas with TCP a packet has to
be sent N-1 times (where N is the cluster size)
Q: Normally how much time it takes for new node to be detected by
mod_cluster..is it configurable?
A: No, it is not configurable. As soon as the JBoss instance is started, it
(and its webapps) will get registered.
The time required to do this depends on how the node finds out
about the proxy. If you’ve configured mod_cluster with a static
proxy list, then it registers with the httpd proxy upon startup. If
you configured mod_cluster server-side to use an HASingleton (via
HAModClusterService), then it knows about the proxy upon joining
the cluster - also upon startup. Otherwise, you are relying on the
advertise mechanism - so the time required to register with the
proxy is a product of the advertise interval (AdvertiseFrequency,
configured in httpd.conf), and the status interval
(Engine.backgroundProcessorDelay, configured in server.xml)
Q: how the new servers got added pick up the sessions? are they new or existing
sessions?
A: The new servers use a mechanism provided by JGroups called state transfer
(see http://www.jgroups.org/manual/html/user-channel.html#GetState), which
copies the existing sessions into a new server. This way, the new server can
be failed over to should an existing server crash.
Note that state transfer is not needed if we use distribution instead of
replication (see above).
Q: When performing rolling upgrades, how do you mitigate issues where the
database schema changes? So certain domains may be using JNDI to hook into
one core db - if another domain is upgraded in a roll out then hibernate
will update / alter those tables?
A: Schema migration is a difficult topic, outside the scope of mod-cluster. One
possible way could be to have a separate DB in the new domain, drain the old
domain, and - when the old domain is shut down - transfer the data from the
old to the new DB. But, again, this is very application dependent, and
generic advise moot.
Q: Is mod_cluster also wokring with JBoss 5.1 with the same power, or does it
require Jboss 6?
A: mod-cluster works with 5.1, but is already integrated into AS 6 out
of the box.
The latest mod_cluster 1.1.0.CR3 release will work with JBoss 5.1
with no configuration changes - just drop in the mod_cluster.sar
into the $JBOSS_HOME/server/all/deploy directory.
Q: How do nodes identify other nodes within their cluster? In other words how
do EC2 nodes only cluster with EC2 nodes etc.?
A: Nodes find other nodes through JGroups (www.jgroups.org). On EC2, we can
either use a GossipRouter, which is a separate lookup process, or S3_PING
which is based on S3 buckets.
A cluster is defined via (a) the same configuration and (b) the same cluster
name. All nodes which have (a) and (b) form a cluster. Nodes which have (a)
but a different cluster name for a different cluster.
Q: Is it possible to shutdown and drain a single web app?
A: Yes. The steps are:
- Disable the app
- Wait until the sessions for the app have drained
- Undeploy the app
- Deploy the new app
Note that the old and new webapp needs to be compatible, ie. classes
cannot change between redeployments.
If there is an incompatible change, I recommend to drain all webapps of the same type
(context) in a domain.
Note that undeploy of a web application will perform the above operations automatically! Use the stopContextTimeout/stopContextTimeoutUnit config properties to control the default drain timeout. If you’re using session replication, then you don’t need to wait for all sessions to drain - just all current requests to complete, since those session will be available elsewhere. The method of draining is determined by whether or not the target web application is distributable or not. Additionally, the sessionDrainingStrategy config property can be used to always force session draining, even for distributable web applications.
Alternatively, you can stop a single context manually in once step via the stopContext(…) JMX operation.
Q: Is mod_cluster delivered as a native module in Apache, just as mod_proxy?
A: Yes
Q: Does the "load balancer demo app" come with mod_cluster?
A: Yes, under /demo/client
Q: Can you configure the jboss nodes to announce themselves to the httpd
servers over a local/private network keeping that communication private and
seperate from the public access to the application?
A: Yes. You can - since a separate connection is used, provided these
routes exist. This private network address/port would be provided by
the advertise mechanism or via the server-side proxyList.
The private and public network could be created in httpd.conf,
using virtual hosts.
Q: When a new version of a web app is deployed, how does JBoss/mod_cluster
know how to replicate between old versions and new versions?
A: The webapp needs to be compatible to existing versions. If it isn’t, deploy
it into a new domain, or redeploy all existing webapps of the same type.
Q: Can mod_clustered enabled when v use configure Elastic Load Balance?
A: Yes, but this doesn’t make much sense. Compared to ELB, mod-cluster is (1)
cloud independent (ELB only exists in EC2), (2) allows for dynamic
registration of workers (this is static in ELB), (3) allows for dynamic
registration/de-registration of webapps (ELB doesn’t) and (4) sends dynamic
load balancer information back to httpd (ELB has some built-in LB
functionality, but it is not extensible).
Q: what about the performance when we divide one large cluster in to small
clusters?
A: Performance is probably better, for various reasons. For example, if we use
TCP, cluster wide calls (RPCs) have a cost of N-1. With smaller N’s, these
calls become less costly.