> ## Documentation Index
> Fetch the complete documentation index at: https://docs.towns.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Best Practices

> The following guide describes high level best practices for running a Stream Node as a node operator.

## Node Operations Best Practices

Whether operating nodes in the public cloud or on-premise, the following best practices are recommended to ensure availability and reliability of your Stream Nodes.

### Maintenance

<Steps>
  <Step title="Regular Updates">
    * Keep your node software up to date with the latest security patches and feature updates. Regularly check for new releases.
    * Using the deployment scripts, you can bring a new instance of a FE up into validation mode, and once you are confident it is passing health checks, you can switch it to be your new Running node and de-provision your prior FE.
    * Storage schema updates are performed via db migrations using [golang-migrate](https://github.com/golang-migrate/migrate) and are performed automatically within releases as part of Node FE upgrades.
    * You are encouraged to set `STANDBYONSTART` to true in your environment variables if you are running blue green deployments behind a proxy or network load balancer. This will allow you to bring up a new instance of the FE in standby mode, and once it is passing health checks, it will automatically switch to primary.

    <br />

    <Note>If your node operator has not yet been fully registered or your node has not yet been registered, you can test running an unattached Node Fe to ensure health checks pass by setting `STANDBYONSTART=true`. A node started in this manner will not attach to Storage layer or shutdown if it is not registered yet on the Towns Chain.</Note>
  </Step>

  <Step title="Backups">
    * Implement routine backups of your Node Storage to prevent data loss.
    * Establish an operating procedure for automated or manual data restoration from backups in the event of an outage.
    * Use monitoring tools to track your node's performance and health.
    * If your node storage crashes, you can restore a backup, and recover/catchup from the peers responsible for the same chunks of data.

    <Warning>As of October 2024, stream replication has not yet been implemented. Until stream replication is implemented, data recovery cannot be achieved from peers. Therefore, it is important that node operators backup their node storage regularly and ensure high availability through their cloud provider database setup. </Warning>
  </Step>

  <Step title="Monitoring">
    * Regularly review your node's performance. Use logs, metrics, and profiling tools exposed by node to tune your observability stack.
    * Adjust resource allocation and network settings as needed to optimize for throughput and reliability.

    <Note>By settings `METRICS__ENABLED=true` in your node's environment, you can enable detailed metrics collection for your node. Metrics are instrumented using [Open Telemetry](https://opentelemetry.io/docs/specs/otel/metrics/) and can be used to monitor your node's performance and health by navigating to the metrics endpoint at `https://<node-hostname>/metrics`. See [node observability](/node-operator/tutorials/node-observability) for more information.</Note>
  </Step>
</Steps>

### Troubleshooting

Using logs, metrics and profiling tools, you can identify and resolve issues with your node. Some issues may have specific resolutions on the [Towns Issue Tracker](https://github.com/towns-protocol/towns/issues?q=is%3Aissue+is%3Aclosed).

If there's a new bug or security vulnerability found, please file an issue on the [Towns Issue Tracker](https://github.com/towns-protocol/towns/issues/new/choose) for the core development team to address.

<Steps>
  <Step title="Common Issues">
    * Address typical problems such as connectivity issues, slow transaction
      processing, or database errors with targeted troubleshooting steps provided
      in the network's documentation.
  </Step>

  <Step title="Diagnostic Tools">
    Stream nodes are equipped with a variety of diagnostic tools to help you troubleshoot issues.

    You can run pprof endpoints securely to collect and analyze profiling data emitted by your node. This is reccommended if you are experiencing performance issues or suspect memory leaks.

    Set `DEBUGENDPOINTS__PPROF=true` in your environment variables to enable pprof endpoints.

    Then either set `DEBUGENDPOINTS__PRIVATEDEBUGSERVERADDRESS` to a specific address (i.e. '127.0.0.1:8080') and port or `DEBUGENDPOINTS__MEMPROFILEDIR` to a directory to save the memory profile files periodically.

    `pprof` endpoints available:

    * /debug/pprof/
    * /debug/pprof/cmdline
    * /debug/pprof/profile
    * /debug/pprof/symbol
    * /debug/pprof/trace

    See [debug.go](https://github.com/towns-protocol/river/blob/main/core/node/rpc/debug.go) for more information and full capabilities.

    <Note>If saving pprof results to node image storage, you will need to mount a volume or read the files between node restarts to prevent losing them</Note>

    <Note>If you are running a node in a containerized environment, you will need to mount a volume to the node's image storage directory to save the pprof results</Note>
  </Step>
</Steps>

### Security

<Steps>
  <Step title="Access Control">
    * Implement strict access controls for administrative operations. Use secure
      authentication methods to protect against unauthorized access.
  </Step>

  <Step title="Encryption and Network Security">
    * Secure data in transit and at rest using encryption. Apply network
      security best practices, such as firewalls and secure protocols, to protect
      against external attacks.
  </Step>

  <Step title="Regular Security Audits">
    * Conduct regular security audits to identify and mitigate potential
      vulnerabilities. Stay informed about the latest security threats and apply
      recommended countermeasures.
  </Step>
</Steps>
