Installation readme
Every 3 months we have a new release of software, where we are adding new features which were requested by the clients (Change requests). Releases also consist of different fixes if there is any. The goal is to provide our customers with fully operating software solutions with no additional manual configuration after installation. This document will describe how to prepare and how to make installation on customers environment.
First of all, before any installation on PROD Environment we should test the new software version, that is why firstly we make installation on customer’s TEST Environment. After installation on Test, our customers start testing different features and imports to track if everything is working properly and that after installation on PROD no Blockers or other Critical Errors will appear that it why it is crucial to take the same version for the PROD from the Test environment as it was tested and it should work properly. For each installation there should be created specific Jira issue to track all information and changes if necessary.
How to make installation
Test environment
Firstly, you need to generate a version using Jenkins if needed. To generate it you need to go to http://10.0.1.17:8180/jenkins/view and enter your login and password. Once you are there, you need to select for which environment you need to build new versions (Asgard/Midgard| Test/Production). Then select required customer (each customer has different settings and it is important to select correct one) and check if the Stable version is built. If the version is Unstable it means that it can cause different issues during the test and something might not work properly. Once the stable version is built it will automatically appear on server. Next you need to transfer installation files to /apps/tmp/ folder and uncompress the packages. After that you need to follow check list to make installation correctly. Check list can be found in attachment to this document.
Production environment
The installation on production is similar to installation on test with some little exceptions. We do not create new version on Jenkins as it is not tested, instead we are using the version that was on Test environment and was tested for several weeks/months.
How it can be transferred:
Refer to packages-transfer documentation
Then you need to set the downtime for the time when installation will take place. Do not forget to go to Installation calendar and add installation event, do not forget to indicate who will do smoke tests. After installation is done, check all logs for Exception and monitor performance for the rest of the day in case any issues appears.
Asgard/Midgard installation/adjustment checklist
- Pre-conditions
- Jira issue in place
- Topic and changes are known
- All sides informed and confirmed (Customer,BP,TSQA,STBY,PM)
- Advise to Customer not to schedule any Imports/Jobs within 5h before planned downtime
- Preparations
- Check for scheduled Jobs/Processes, to not interfere with installation time
- Check if there is enough disk space for files and installation
- Check if there are changes done after current version that mandatory needs to be applied to the application after installation
- Add note to installation calendar
- Schedule Downtime in Nagios for affected services
- Customer announcement Backup/Revert policy
- Escalation
- Steps of action
- Check running processed and Jobs
- Save crontab
- Delete crontab
- Run backup key config script
- Check all processes are finished
- Run installation
- Post installation check
- Check if FLD application is available and login is possible
- Check the logs for Exceptions and disruptions
- Check if the crontab varies (before installation<>with installation)
- Check cycle status
- Check job machine status
- Check job/qcjob table for errors/schedule deviations
- Check exports/imports
- Notification
- Ready for smoke test announcement to BP
- Smoke test – OK, received from all sides
- Customer announcement
- Comment related Jira issue with the progress/done status
- Monitoring
- Monitor system behaviour for next 1h for abnormal behaviour
Pre-Conditions:
- There is Jira ticket with set dates and agreements
- In Jira ticket are indicated all changes and topics that customer expects to be installed.
- All sides know that there will be installation and confirmed that it can happen
- Customer was informed that no new Jobs/Imports are scheduled for at least 5H before the installation.
Preparations:
- Check if there are scheduled jobs/processes that might interfere installation and if needed schedule it for a later time.
- Check that environment has enough disk space for installation files
[df -h] - Put files in tmp folder and make unzip
a.
[mv/cp filename dir]b.tar -xzvf filename - Check if there were any changes done after the update to fix the issues, add those files in newly created IN folder
- Schedule the installation event indicated Customer/Version and who makes installation. Example – A3-Jinjer/dduhs
- Go to: a. We host customer - http://10.100.1.52/nagios/ b. Customer hosts by themselves - http://nagios.qc1/nagios/ Login with personal user go to XXXXXX and set XXXXXX to schedule downtime for installation type and avoiding getting notifications that server is down.
- Write mail notification to customer and all involved parties that installation will start in few minutes.
Start of action:
- Save Crontab -
[cronsave] - Remove crontab
[crontab -r] - Check all running processes and Jobs and wait till everything is finished
[watch ps fx] - Run backup key config
[/apps/qcsupport/support/backup_key_config.sh] - Run installation
[./install.sh] - Wait till it finishes
- Send command to start writing logs
[logstart] - NOTE: This should be done after all tests! If everything works correctly reduce size of previous installation to 0
[truncate -s 0 filename]
Post installation check:
- Go to FLD and try to login – to check if application is working
- Check logs for Exception and disruptions
a.
GREP FOR EXCEPTION - Check if crontab appeared
[crontab -l] - Check cycle and job machine status
- Check job/qcjob for errors/schedule deviation -this can be checked in DB
- Check exports/imports – to check that everything work properly.
Notifications:
- Notify BP to start Smoke test (can be done once FLD is checked)
- Receive notification from all parties that smoke test is done and everything works
- Notify customers and all related parties that installation is done and environment is available again
- Leave a comment in Jira issue with the progress/done/issues (if any found)
Monitoring:
- Monitor system for Errors and Exception for few hours (at least 1-2 hours)