Best Practices
The following guide describes high level best practices for running a Stream Node as a node operator.
Node Operations Best Practices
Whether operating nodes in the public cloud or on-premise, the following best practices are recommended to ensure availability and reliability of your Stream Nodes.
Maintenance
Regular Updates
- Keep your node software up to date with the latest security patches and feature updates. Regularly check for new releases.
- Using the deployment scripts, you can bring a new instance of a FE up into validation mode, and once you are confident it is passing health checks, you can switch it to be your new Running node and de-provision your prior FE.
- Storage schema updates are performed via db migrations using golang-migrate and are performed automatically within releases as part of Node FE upgrades.
- You are encouraged to set
STANDBYONSTART
to true in your environment variables if you are running blue green deployments behind a proxy or network load balancer. This will allow you to bring up a new instance of the FE in standby mode, and once it is passing health checks, it will automatically switch to primary.
STANDBYONSTART=true
. A node started in this manner will not attach to Storage layer or shutdown if it is not registered yet on the Towns Chain.Backups
- Implement routine backups of your Node Storage to prevent data loss.
- Establish an operating procedure for automated or manual data restoration from backups in the event of an outage.
- Use monitoring tools to track your node’s performance and health.
- If your node storage crashes, you can restore a backup, and recover/catchup from the peers responsible for the same chunks of data.
Monitoring
- Regularly review your node’s performance. Use logs, metrics, and profiling tools exposed by node to tune your observability stack.
- Adjust resource allocation and network settings as needed to optimize for throughput and reliability.
METRICS__ENABLED=true
in your node’s environment, you can enable detailed metrics collection for your node. Metrics are instrumented using Open Telemetry and can be used to monitor your node’s performance and health by navigating to the metrics endpoint at https://<node-hostname>/metrics
. See node observability for more information.Troubleshooting
Using logs, metrics and profiling tools, you can identify and resolve issues with your node. Some issues may have specific resolutions on the Towns Issue Tracker.
If there’s a new bug or security vulnerability found, please file an issue on the Towns Issue Tracker for the core development team to address.
Common Issues
- Address typical problems such as connectivity issues, slow transaction processing, or database errors with targeted troubleshooting steps provided in the network’s documentation.
Diagnostic Tools
Stream nodes are equipped with a variety of diagnostic tools to help you troubleshoot issues.
You can run pprof endpoints securely to collect and analyze profiling data emitted by your node. This is reccommended if you are experiencing performance issues or suspect memory leaks.
Set DEBUGENDPOINTS__PPROF=true
in your environment variables to enable pprof endpoints.
Then either set DEBUGENDPOINTS__PRIVATEDEBUGSERVERADDRESS
to a specific address (i.e. ‘127.0.0.1:8080’) and port or DEBUGENDPOINTS__MEMPROFILEDIR
to a directory to save the memory profile files periodically.
pprof
endpoints available:
- /debug/pprof/
- /debug/pprof/cmdline
- /debug/pprof/profile
- /debug/pprof/symbol
- /debug/pprof/trace
See debug.go for more information and full capabilities.
Security
Access Control
- Implement strict access controls for administrative operations. Use secure authentication methods to protect against unauthorized access.
Encryption and Network Security
- Secure data in transit and at rest using encryption. Apply network security best practices, such as firewalls and secure protocols, to protect against external attacks.
Regular Security Audits
- Conduct regular security audits to identify and mitigate potential vulnerabilities. Stay informed about the latest security threats and apply recommended countermeasures.
Was this page helpful?