(09-07-2016 05:35 PM)lingu Wrote: [ -> ]This is too heavyweight -- it may even disrupt Sage if it is busy.
Sending a file takes 2-3s. This is considered long/heavyweight.
I ever tested to send a small file from s188 to s188. It only costs 0m0.009s. I think it is good enough.
Quote:Quote:Then glad operative user & sage_user can use sage_key to execute remote command to sage_user@sage_portal.
2. The glad_portal IP can be get from $gb/conf/config.sh
How (a convar like glad_portal?) Have we made sure such a varaible must exist in $gb/conf/config.sh. If not, please add a note there "need implement" so that we know to create a TODO item and implement this later after this design is endorsed.
We already have a $glad_portal that can be used directly. It is set by util/install.sh during installing.
Quote:What if the wait_sage_start fails?
We need a way to check Sage has started. I thought this is what I thought you were doing. But it looks like here if auntie continues to fail the wait still completes without an error halt.
This is a bug. I will fix it.
(09-07-2016 05:53 PM)YU_Xinjie Wrote: [ -> ] (09-07-2016 05:35 PM)lingu Wrote: [ -> ]This is too heavyweight -- it may even disrupt Sage if it is busy.
Sending a file takes 2-3s. This is considered long/heavyweight.
I ever tested to send a small file from s188 to s188. It only costs 0m0.009s. I think it is good enough.
The auntie command may complete in 0.009s, but the transfer should take longer because I added a 1s wait somewhere. Maybe it is not in the critical path...
Anyway, it is okay to keep it this way for a while.
Quote:We already have a $glad_portal that can be used directly. It is set by util/install.sh during installing.
OK. Then it's fine.
Quote:Quote:What if the wait_sage_start fails?
We need a way to check Sage has started. I thought this is what I thought you were doing. But it looks like here if auntie continues to fail the wait still completes without an error halt.
This is a bug. I will fix it.
Please revise the design of this part first as a reply then I can reply to endorse the overall design.
The real "fix" should be after the design is endorsed.
(09-07-2016 06:18 PM)lingu Wrote: [ -> ]Please revise the design of this part first as a reply then I can reply to endorse the overall design.
The real "fix" should be after the design is endorsed.
Updated the code of "start sage" in the headpost.
Please take a look again.
(09-07-2016 06:37 PM)YU_Xinjie Wrote: [ -> ] (09-07-2016 06:18 PM)lingu Wrote: [ -> ]Please revise the design of this part first as a reply then I can reply to endorse the overall design.
The real "fix" should be after the design is endorsed.
Updated the code of "start sage" in the headpost.
Please take a look again.
Okayed the design.
But you ignored my word "as a reply". Pls make sure to execute precisely.
(09-07-2016 05:35 PM)lingu Wrote: [ -> ]Quote:wait_sage_start () {
content="This is test content."
echo $content > $curdir/${USER}_sage_testfile
test_file=/thinker/bin/ephemeral/667.sagetest
# two minutes timeout
try_cnt=40
while [[ ("$try_cnt" -gt "0") && ((! -f "$test_file") || ("$(cat $test_file)" != "$content")) ]]; do
timeout -s SIGINT 1 bash -c "sage_user=$sage_user $curdir/auntie ^$curdir/${USER}_sage_testfile $glad_portal 667 sagetest" || echo -n ''
echo "Trying to start Sage..."
sleep 2
try_cnt=$((try_cnt-1))
done
rm -f $test_file
}
Too long wait. If it does not work in 30 sec, report an error and stop.
1. While implementing, I find "sage start" cost 21 seconds on average in my dev cluster. The perf of my dev cluster is not well since they are VMs.
Sometimes it would cost more than 30 seconds. Then the sage is actually started but my script reports failed.
Therefore I suggest we still use 2 minutes timeout here.
2. The sleep time is 2 seconds after each trial fails.
But I find even though auntie command is finish quickly if it succeeds, the received file content will exist after 2~3 seconds. So even though auntie sends file successfully, another auntie trial would be triggered. If it is triggered, script would print a ERROR info to confuse user but finally the Sage & service scripts would still succeed.
Therefore I suggest we use "sleep 4" after each trial fails. Then the confusing ERROR info would not appear frequently.
I will implement above change before your review, since I think they are minor change.
For the private key, I encounter an issue:
SSH command would check the premission of private key. If the key can readable by others, SSH would just ignore the key. One method to skip the check is to set the owner of key to be a user that would never use the key. For example:
Code:
[0][15:40:42] xinjie@devmac0e0:/thinker/dstore/gene/glad/conf
$ ll sage_key
-rw-r--r--. 1 dstore dstore 1679 Sep 8 14:57 sage_key
I set the owner to be dstore. Then user gene, sage, xinjie can use the key to access sage_portal. Only user dstore can not do that.
Committed in df99edcb4f9ae617c098714f8036393fbdf8e413 .
(09-08-2016 04:48 PM)YU_Xinjie Wrote: [ -> ]For the private key, I encounter an issue:
SSH command would check the premission of private key. If the key can readable by others, SSH would just ignore the key. One method to skip the check is to set the owner of key to be a user that would never use the key. For example:
Code:
[0][15:40:42] xinjie@devmac0e0:/thinker/dstore/gene/glad/conf
$ ll sage_key
-rw-r--r--. 1 dstore dstore 1679 Sep 8 14:57 sage_key
I set the owner to be dstore. Then user gene, sage, xinjie can use the key to access sage_portal. Only user dstore can not do that.
If the user is not the owner but can read the file and other users can read the key file, too, why does SSH allow the user to use the key? This seems to be a bug in SSH.
Is the behavior dependable? If the current behavior is not well defined, we may not want to rely on it although it works now. In the future, if the behavior changes, our programs will break.
Can we use multiple key files of the same content, i.e., key-sage, key-xinjie, ..., so that the key file's access can be restricted to that particular user?
In general, avoid hacks unless there is no other way out. Keep things simple stupid. Hacks are fun, but it's not dependable. It's like sleeping a drunk girl -- if anything lasted after the fun through the next sunrise it would be some mess of drying sperm mixed with vomits. That's not beauty of life.
(09-07-2016 05:35 PM)lingu Wrote: [ -> ]Quote:1.
require sage_user can password-less login sage_user@sage_portal
This is fine if all are sage_user.
Quote:copy private key of sage_user@sage_portal into $gb/conf/sage_key
$gb is owned by glad_user and all operative users can access it. So this is a security vulnerability. OK now. But make a TODO to fix it later.
I think I may not have really understood what we are doing here. Are we trying to let glad_user ssh to sage_user@sage_portal? Or are we trying to let sage_user ssh back to sage_user@sage_portal?
For the latter, we can perhaps let all sage_user have the same key on all nodes? But the keys may not be stored in $gb.
For the former, we should perhaps add glad_user's pub key to sage_user@portal's authorized_keys during the installation of glad?