Configuring ClickHouse Parameters

This article describes how to modify the configuration of a
ClickHouse Cluster using the Instaclustr Console. It assumes that you have already provisioned a ClickHouse cluster in accordance with the steps in the article on Creating a ClickHouse Cluster. 

You can alternatively choose to update the cluster configuration via the Instaclustr Provisioning API or Terraform. Please note that using these methods you can only modify the values of a limited set of parameters. If you’re interested in modifying other parameters, please reach out to our support team.  

The available parameters are a mix of the global server settings and query-level settings.  

  • The query-level parameters are applied to the “Default” user profile, and will not apply to users under different profiles, if you’ve created any.  
  • Global server parameters instead will apply to all users. 

You can check the relevant documentation to determine the scope of a parameter. 

Changing Values Away from Defaults  

You can “override” the available parameters with values of your choice to alter cluster behaviour to suit your requirements. Some parameters may be constrained by available compute resources, shard-replica configuration etc. and you may wish to review such details before overriding them.

  1. To get started, after signing into the console, click on the dropdown button with the name of the ClickHouse cluster you want to modify.
  2. To modify the configuration on the cluster, simply click on the Configuration button that appears in the drop down list. This will take you to a page which displays all configuration parameters available.
  3. Once on the Configuration page, change the default value of an available configuration parameter to your desired value. The Status will change to “Override Pending”, indicating the setting has not been applied yet.
  4. Once you’ve updated each parameter you’d like to update with the desired values then scroll to the bottom of the page and click “Apply Pending Changes”.
  5. Applying the changes updates the ClickHouse configuration to the cluster and can take up to a couple of minutes. During this time, you are unable to apply further pending changes to the cluster.
  6. The information box will confirm when the cluster has been updated. 
  7. All overridden parameters now show the “Overridden” status and their respective new values. 

Resetting Parameter Values to Defaults 

  1. The “Reset To Default” option is enabled allowing the parameter to be reset to the default value. 
  2. Pressing “Reset To Default” sets the parameter status to “Reset Pending”, then the reset can be applied by selecting “Apply Pending Changes”.
  3. The “Clear Pending Change” button restores the current value of a parameter, either “Default” or “Overridden”. Similarly the “Clear Pending Changes” button at the bottom of the page restores the current value of all parameters. 
  4. The “Restore All” button will reset all parameters to the default value. This button requires confirmation via a pop up window and once “Proceed” is clicked the configuration will be loaded onto the cluster. 

Available Parameters

The following section provides details on the parameters whose values you can modify. Each parameter has specific constraints on allowable values. Ensure that the values set for each parameter fall within the specified constraints. Please note that, certain parameters have constraints that depend on cluster configuration (e.g., insert_quorum). For the parameters listed below an application restart is not required unless otherwise specified.

  • distributed_ddl_task_timeout
    Sets the timeout (in seconds) for DDL query responses from all hosts in a cluster.Default Value: The default value is 180 seconds.
    Allowed Range: This parameter can be set within the range of -1 to 1800. A value of -1 indicates an infinite timeout, while a value of 0 indicates async mode.
  • insert_quorum
    Determines the number of replicas that must successfully write data for an INSERT query to be considered successful. Valid values are: 0 (disabled), 2 or higher (enabled), auto (majority of replicas).

    Default Value: The default is set to 0.
    Allowed Range: For a single replica cluster, this parameter cannot be changed from 0. For a cluster with two replicas, it can be set to 0, 2, or auto. For a cluster with three replicas, it can be set to 0, 2, 3, or auto. The set value is interpreted as: 0 (disabled), 2 or higher (enabled), auto (majority of replicas). 
  • insert_quorum_parallel
    Enables or disables parallelism for quorum INSERT queries.Default Value: The default value is 1, which enables the feature.
    Allowed Range: It can be set to either 0 (disabled) or 1 (enabled). 
  • insert_quorum_timeout
    Maximum wait time (in milliseconds) for a quorum write to complete before throwing an exception.Default Value: The default value is 600000 ms (10 minutes).
    Allowed Range: This parameter can be set within the range of 0 to 1,000,000,000,000.
  • log_queries
    Enables or disables query logging.Default Value: The default value is 1, which enables query logging.
    Allowed Range: This parameter can be set to 0 (disabled) or 1 (enabled).
  • max_bytes_before_external_group_by
    Maximum RAM for GROUP BY before using external memory.Default Value: The default value is 0 (disabled).
    Allowed Range: This parameter can be set to any value from 0 up to the memory size.
  • max_bytes_before_external_sort
    Maximum RAM for ORDER BY before using external memory.Default Value: The default value is 0 (disabled).
    Allowed Range: This parameter can be set to any value from 0 up to the memory size.
  • max_execution_time
    Maximum query execution time in seconds.Default Value: The default value is 0 (unlimited).
    Allowed Range: This parameter can be set within the range of 0 to 1,000,000,000,000.
  • max_insert_threads
    Maximum number of threads to execute the INSERT SELECT query. Where 0 (or 1) is no parallel execution. Set to a value greater than 1 for parallel execution.
     
    Default Value: The default value is 0.
    Allowed Range: This parameter can be set to any value within the range of 0 to 1,000,000,000,000.
  • max_partition_size_to_drop
    Maximum size of partition that can be dropped.Default Value: The default value is 50000000000 bytes.
    Allowed Range: This parameter can be set within the range of 0 (unrestricted) to 1,000,000,000,000.
  • max_threads
    Maximum number of query processing threads, excluding threads for retrieving data from remote servers. This parameter applies to threads that perform the same stages of the query processing pipeline in parallel.Default Value: The default value is set to auto(2).
    Allowed Range: The allowable range for this parameter is between 0 and 1,000,000,000,000.
  • optimize_on_insert
    Controls merging of data parts during insert.Default Value: The default value is 1, which enables optimization on insert.
    Allowed Range: This parameter can be set to either 0 (disabled) or 1 (enabled).
  • use_skip_indexes
    Enables or disables use of skip indexes.Default Value: The default value is 1, which enables the feature.
    Allowed Range: This parameter can only be set to 0 (disabled) or 1 (enabled).

 

Configs that we override ClickHouse defaults

Instaclustr modify some ClickHouse configurations to optimise the application performance based on usages and VM sizes being offered on our platform. Please note that many configurations could also be set on a query level using the SET keyword. Details of it can be found in the ClickHouse GitHub documentation.   

  • use_query_cache
    If enabled, SELECT queries may use query cache. Default value: 0 (disabled)
    Instacluster overriden value: 1 (enabled)
    Allowed range: 1 (enabled) or 0 (disabled) 
  • mark_cache_size
    Maximum size of cache for marks. Default value: 5GB
    Instaclustr overriden value: Dynamic to node size. For nodes with less than 32GB RAM, the parameter is set to 15.625% of total RAM; for nodes with 32GB or more RAM, it’s set to 5GB. (15.625% was calculated as 5/32)
    Allowed range: Greater than 0. 

Configs changes that require application restart to take effect

Some configuration parameters require ClickHouse application to restart to be applied. These parameters could be modified via support tickets to Instaclustr.   

  • max_thread_pool_size
    ClickHouse uses threads from the Global Thread pool to process queries. If there is no idle thread to process a query, then a new thread is created in the pool. max_thread_pool_size limits the maximum number of threads in the pool. Default value: 10000
    Allowed range: Greater than 0. 
  • old_parts_lifetime
    The time (in seconds) of storing inactive parts to protect against data loss during spontaneous server reboots. Default value: 480
    Allowed range: Greater than 0. 
  • number_of_free_entries_in_pool_to_lower_max_size_of_merge
    When there is less than the specified number of free entries in pool (or replicated queue), start to lower maximum size of merge to process (or to put in queue). This is to allow small merges to process – not filling the pool with long running merges. Default value: 8
    Allowed range: Greater than 0. 
  • merge_selecting_sleep_ms
    Minimum time to wait before trying to select parts to merge again after no parts were selected. A lower setting will trigger selecting tasks in background_schedule_pool frequently which result in large amount of requests to zookeeper in large-scale clusters Default value: 5000
    Allowed range: Greater than 0. 
  • mark_cache_size
    Maximum size of cache for marks. Refer to the section Config Changes that require application restart to take effect for details. Default value: see above 
    Allowed range: see above 
  • background_pool_size
    Sets the number of threads performing background merges and mutations for tables with MergeTree engines. Default value: 16
    Allowed range: 1, 2 4, 8, 16 and so on, with the maximum allowed dependent on the number of vCPUs on the node.  

 

Configs that are not customer configurable but can be changed via support request

Aside from the parameters that require restart to be applied, there are a few other parameters that Instaclustr identified to be customisable, but in rarer cases. They therefore also require support tickets to be modified.  

  • Max_concurrent_queries
    Limit on total number of concurrently executed queries. Note that limits on INSERT and SELECT queries, and on the maximum number of queries for users must also be considered. Default value: 0 (unlimited)
    Allowed range: We allow any value greater or equal to 0 for this parameter, where 0 means unlimited and any other value means that exact number