Service management of Sage - D
|
07-12-2016, 05:54 PM
(This post was last modified: 11-23-2019 11:41 PM by lingu.)
Post: #1
|
|||
|
|||
Service management of Sage - D
The src of the sage program is cod://sage/src/service/bin/sage
goal 1. the script can be used on sage_user@sage_portal to start/stop/status sage. 2. the script should not couple GLAD. The script would replace the current start.sh/stop.sh/status.sh. sage main logic Code: modname = sage sage start Start a sage_screen to run script start_sage. start_sage script looks like this: Code: trap ctrl_c Use auntie/ps to check whether sage is started correctly. If start successfully, wake_sage script is executed in a screen. wake_sage would periodically output the tasks graph info into $sage_base/stdout/tasks_graph.tmp.<timestamp>. sage stop Use 'dt slay' to stop DT. Kill the sage_screen. Use ps to check whether sage is stopped correctly. sage status "dexer auntie" to check the status of sage. sage ps Use "screen -ls" to get the sage_service screens info. Use "dexer 'ps aux' | grep -E 'vpc|nrc|scheduler|mem_home|atpd|ncat|xfer|sgl|: ps aux'" to get the processes info. sage tasks sage tasks invokes function tasks_sage(): Code: local tasks_graph_file=$base/stdout/tasks_graph.tmp --------- 20191123/lingu: move tasks out. 20190511/cwt: Add auntie call. 20190510/lingu: timeout default 3. 20190509/cwt: Add common var import. Add timeout, IP count etc. 20190508/lingu: add src location. 20160927/yxj: decouple sage service script with glad. |
|||
09-07-2016, 02:00 PM
Post: #2
|
|||
|
|||
RE: Service management of Sage
(07-12-2016 05:54 PM)YU_Xinjie Wrote: This thread is the design of service management of Sage Saved a copy before change. |
|||
09-07-2016, 02:47 PM
Post: #3
|
|||
|
|||
RE: Service management of Sage
@lingu
Please review the design, so that I can implement it. |
|||
09-07-2016, 03:53 PM
Post: #4
|
|||
|
|||
RE: Service management of Sage
It seems to be quite complex. Do we need to really pay that complexity? If so, please let me know and I'll review the design in detail.
Here is what I think we can do: 1. Start/stop sage using the current way (start.sh and kill). 2. An operative user may not be able to run start.sh and kill as the user 'sage'. Hence, there should be some mechanism. For example, can we create a user 'gladsage' whose login shell is a command that uses a key file to login as sage and run start.sh or kill. |
|||
09-07-2016, 03:55 PM
Post: #5
|
|||
|
|||
RE: Service management of Sage
We still need to handle race condition for running multiple sage thinkers. But I think thinkers already have such conflict reporting capability. We just need to track the return status of start.sh and report an error as well as stop glad when errors occur.
If start.sh does not report the return status correctly, we need to perhaps improve start.sh. |
|||
09-07-2016, 03:58 PM
Post: #6
|
|||
|
|||
RE: Service management of Sage
(07-12-2016 05:54 PM)YU_Xinjie Wrote: The brief idea is to use auntie to send a file to glad_portal. Then check the file content. Directly running auntie without argumets can already check if sage is present. Quote:1. This is a key design part. I agree with your design as a quick implementation. But it has a security hazard -- if an op user can access the key, he/she can run anything as sage. Hence, please consider the 'gladsage' solution I wrote about when you have time to refine the solution. |
|||
09-07-2016, 04:11 PM
Post: #7
|
|||
|
|||
RE: Service management of Sage
(09-07-2016 03:53 PM)lingu Wrote: It seems to be quite complex. Do we need to really pay that complexity? If so, please let me know and I'll review the design in detail. The code I post is actually the script I develop for shbio. It works well for more than one week. It is complex but robust. I hope you can check it, so that I can put it into our glad/sage code tree. You do not need to check every options of every command, because they work. But you'd better check the brief method. In brief, they are just a remote cmd wrapper for the start.sh/stop.sh/status.sh, except the function wait_sage_start, which uses auntie to check sage. (09-07-2016 03:55 PM)lingu Wrote: We still need to handle race condition for running multiple sage thinkers. But I think thinkers already have such conflict reporting capability. We just need to track the return status of start.sh and report an error as well as stop glad when errors occur. Good point. I will check it. (09-07-2016 03:58 PM)lingu Wrote:(07-12-2016 05:54 PM)YU_Xinjie Wrote: The brief idea is to use auntie to send a file to glad_portal. Then check the file content. It does not work. I already use the latest auntie. I report it in this thread: http://tab.d-thinker.org/showthread.php?tid=6506 |
|||
09-07-2016, 05:06 PM
(This post was last modified: 09-07-2016 05:07 PM by lingu.)
Post: #8
|
|||
|
|||
RE: Service management of Sage
(09-07-2016 04:11 PM)YU_Xinjie Wrote:(09-07-2016 03:53 PM)lingu Wrote: It seems to be quite complex. Do we need to really pay that complexity? If so, please let me know and I'll review the design in detail. I will review. But this is TERRIBLE practice. Please make sure to get endorsement from another engineer before starting to implement an important function. If you ask me to review but I don't respond, please remind me for an important review. If you need to spend half a day to implement some code, you should be prudent in writing the code directly. Instead, invite reviews. We may find a way to solve the problem in 2 hours. If there is no response after reminders, you may decide to go ahead with implementing an important piece of code. If it takes only 30min to implement something, you can go ahead and implement it without endorsement if you don't want to wait. If the change is insignificant, you may also skip the review. It is generally not practical to define what is "important" or "significant". We rely on engineers' judgement. This one -- how to start/stop sage and make it work with glad -- is certainly an important piece of design. |
|||
09-07-2016, 05:18 PM
Post: #9
|
|||
|
|||
RE: Service management of Sage
(09-07-2016 05:06 PM)lingu Wrote: I will review. But this is TERRIBLE practice. Please make sure to get endorsement from another engineer before starting to implement an important function. If you ask me to review but I don't respond, please remind me for an important review. Okay. I agree these rules. Will follow them. |
|||
09-07-2016, 05:35 PM
Post: #10
|
|||
|
|||
RE: Service management of Sage
(07-12-2016 05:54 PM)YU_Xinjie Wrote: goal Sage needs to be started automatically by glad for a batch of samples (dataflows). It is okay to have a user manually start it at present. But this is not the way we want for the future. It adds a human operation. An extra human operation makes the software a bit less easy to use, and a whole lot easier to make a mistake. So, when designing, you can make compromises now, but keep in mind what we want in the long run. Quote:The script would replace the current start.sh/stop.sh/status.sh in the future. This is good thinking -- using one way, not two ways, to do one task. Quote:[quote] This is too heavyweight -- it may even disrupt Sage if it is busy. Sending a file takes 2-3s. This is considered long/heavyweight. OK to keep the current design if it has worked. But please be aware of latency in a pathetic way -- otherwise our software would be as slow as Hadoop in 2 years. Quote:1. This is fine if all are sage_user. Quote:copy private key of sage_user@sage_portal into $gb/conf/sage_key $gb is owned by glad_user and all operative users can access it. So this is a security vulnerability. OK now. But make a TODO to fix it later. Quote:Then glad operative user & sage_user can use sage_key to execute remote command to sage_user@sage_portal. How (a convar like glad_portal?) Have we made sure such a varaible must exist in $gb/conf/config.sh. If not, please add a note there "need implement" so that we know to create a TODO item and implement this later after this design is endorsed. Quote:3. I prefer to create a convar like glad_sage_user and store it in $gb/conf/config.sh Quote:4. let sage_user@sage_portal can password-less login glad_user@glad_portal. That's fine -- we assume sage_user is no a human operator and can be a powerful guy. Quote:start sage What if the wait_sage_start fails? We need a way to check Sage has started. I thought this is what I thought you were doing. But it looks like here if auntie continues to fail the wait still completes without an error halt. Quote:stop sage OK. Quote:status sage OK. In the future, we should consider using autnie to directly query Sage to get the status. |
|||
« Next Oldest | Next Newest »
|
- View a Printable Version
- Send this Thread to a Friend
- Subscribe to this thread
- Show the subscribers of this thread:
- Add subscribers to this thread: