BIBO on limbo305 - zhihao - 08-08-2021 12:35 PM
BIBO is deployed on limbo305
RE: BIBO on limbo305 - zhihao - 08-08-2021 12:38 PM
I start bibo on limbo305-1 and start sage on limbo305-3, then I test delete but failed
Code:
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_delete_zeng.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
[root@limbo305-1 test]#
log reports
Code:
[root@limbo305-1 sage]# less procone-49.log
[2021-08-08 13:50:33] procone starts
tycano: 49
more args
keys in /thinker/globe/soft/bibo/procuratorate/tests/typical/49/insert.json
There is error when opening file /thinker/globe/soft/bibo/procuratorate/tests/typical/49/insert.json
procone-49.log (END)
then I update toneroot of procone.py to /thinker/globe/soft/bibo/procuratorate/cases , delete test reports can't open file
Code:
[2021-08-08 14:11:31] procone starts
tycano: 51
more args
keys in /thinker/globe/soft/bibo/procuratorate/cases/typical/51/content
keys ready
tyca json loaded.
opening intermediate file /thinker/globe/soft/bibo/procuratorate/cases/res2/res_51.txt-tmp
find no res2 dir, I manually create it then case 52 can be deleted
Code:
sage@limbo305-1 test]$ /thinker/local/soft/bibo/plug/procone.py --tycano 52 --reqtype delete --reqcaseid "67be261c9e6311eabceb005056c00001"
[sage@limbo305-1 test]$ /
[code]
[root@limbo305-1 test]# cat /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json; sleep 18 | ncat localhost 62818
{"type":"delete","caseid":"67be261c9e6311eabceb005056c00001"
}
[root@limbo305-1 test]# echo $?
0
later I add below in procone.py
Code:
toneroot = "/thinker/globe/soft/bibo/procuratorate/cases"
bibofast = "/thinker/fastdata/bibo"
bibe = bibofast + "/e"
reslocal = bibe + "/res2"
RE: BIBO on limbo305 - zhihao - 08-08-2021 07:56 PM
inquire will stuck because yfs has some inactive pg
Code:
[root@limbo305-1 test]# ./test_inquire_zeng.sh
sending inquire request ./processed/zengxingliang-inquire.json to localhost:62818
^C
[root@limbo305-1 test]# ls /thinker/globe/soft/bibo/procuratorate/cases/sum2/
^C
[root@limbo305-1 test]#
[root@limbo305-1 test]# ceph -s
cluster:
id: 33d940a8-7e68-44f3-bc37-305aaaabbbbc
health: HEALTH_ERR
1 clients failing to respond to capability release
1 MDSs report slow metadata IOs
1 MDSs report slow requests
mons limbo305-1,limbo305-2,limbo305-3 are low on available space
1 monitors have not enabled msgr2
2/510 objects unfound (0.392%)
Reduced data availability: 36 pgs inactive, 33 pgs incomplete
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 6/1530 objects degraded (0.392%), 1 pg degraded
32 pgs not deep-scrubbed in time
32 pgs not scrubbed in time
195 slow ops, oldest one blocked for 81807 sec, daemons [osd.5,osd.7] have slow ops.
services:
mon: 3 daemons, quorum limbo305-1,limbo305-2,limbo305-3 (age 23h)
mgr: limbo305-1(active, since 32m)
mds: cephfs:1 {0=limbo305-3=up:active} 1 up:standby
osd: 12 osds: 12 up (since 22h), 12 in (since 22h)
task status:
scrub status:
mds.limbo305-3: idle
data:
pools: 4 pools, 97 pgs
objects: 510 objects, 182 MiB
usage: 13 GiB used, 1.1 TiB / 1.1 TiB avail
pgs: 3.093% pgs unknown
34.021% pgs not active
6/1530 objects degraded (0.392%)
2/510 objects unfound (0.392%)
60 active+clean
32 creating+incomplete
3 unknown
1 active+recovery_unfound+degraded
1 incomplete
3 unknown pgs are in cephfs_metadata, then I reinstall yotta on limbo305-1
RE: BIBO on limbo305 - zhihao - 08-10-2021 12:50 AM
install sage stuck
Code:
[root@limbo305-3 ~]# decent_init=True /thinker/local/forest/util/utilib/installx sage
/thinker/local/shed/installation/limbo305-tc
Installing /thinker/local/shed/installation/limbo305-tc/sage.tar.gz ...
log shows some files not found
Code:
...
10.36.3.51:3124 (slot 19)
Start to run the program with 72 VPCs.
Program execution completed.
protocol error: filename does not match request
protocol error: filename does not match request
protocol error: filename does not match request
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
...
=== Begin Sage Service Test Set ===
./sage_service/sage-service-start.sh
cat: /home/sage/sage/run/service.pid: No such file or directory
cat: /home/sage/sage/run/aide.pid: No such file or directory
Sage is stopped.
lockstep mark /thinker/globe/.think/lockstep//limbo305-3/sage/tested
lockstep mark /thinker/globe/.think/lockstep//limbo305-3/sage/decent/workdone
sage start reports need to configure first.
Code:
[sage@limbo305-3 ~]$ ./sage/bin/sage start
ERROR: Sage has not been configured. Please run /home/sage/sage/bin/configure at sage_portal first.
[sage@limbo305-3 ~]$
[sage@limbo305-3 correctness]$ /home/sage/sage/bin/configure
====== Generating config.pcf ======
inferred and exported:
sage_base: /home/sage/sage
sage_stdout: /home/sage/sage/stdout/sage.out
sage_debug: 0
helper_portal: 10.36.1.49
sage_ipx: /thinker/etc/ips.cfg
sage_atp_cnt: 60
generated /home/sage/sage/config.pcf
====== Generating config.node, sage-svc.inc & hosts.ips ======
-- generating config.node & sage-svc.inc by gen_sage_config
10.36.1.49
10.36.2.50
10.36.3.51
HELPER_CNT: 6
HEAVEN_CNT: 1
ATP_CNT: 60
backup old config.node: /home/sage/sage/config.node.105847
backup old sage-svc.inc: /home/sage/sage/sage-svc.inc.105847
Generating walkers from /thinker/etc/ips.cfg
New node: 10.36.1.49
New node: 10.36.2.50
New node: 10.36.3.51
Add fillers
10.36.1.49:1
10.36.1.49:51
10.36.1.49:2
10.36.1.49:51
10.36.1.49:3
10.36.1.49:51
10.36.1.49:4
...
10.36.1.49:51
Adding sage_atp_staff_space_cnt (3), sage_atp_staff_space_start (20) & sage_atp_head_space (14) to config.pcf
Arguments (Part 2):
WALKER_CNT: 3
FILLER_CNT: -1
sage_atp_staff_space_start: 20
sage_atp_staff_space_cnt: 3
sage_atp_head_space: 14
config.node is generated in /home/sage/sage/config.node successfully.
The old config.node is backuped in /home/sage/sage/config.node.105847.
sage-svc.inc is generated in /home/sage/sage/sage-svc.inc successfully.
The old svc_inc is backuped in /home/sage/sage/sage-svc.inc.105847.
-- generating hosts.ips from config.node
====== Begin to do the original load.sh ======
Parsing config.node
Configuring for 72 containers
10.36.3.51 slots: 52
10.36.1.49 slots: 54 55 56 57 58 59 60 52
10.36.2.50 slots: 52
10.36.3.51 slots: 53
10.36.1.49 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
10.36.2.50 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
10.36.3.51 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
10.36.1.49 slots: 51 50Obtaining config info
Generating config
Sage is going to format the persistent memory of DT after 10 seconds. Please type Ctrl-C if you do not want to do it.
WARNING: Seems the thinker is running. Refused to format the persistent memory.
This may happen because:
a) Someone else is using the thinker.
b) Your program fails or you canceled you program by 'Ctrl-C'.
If you are the one who reserve the thinker for the current time range,
or you are sure that the thinker is running your program, you can
kill the thinker by:
$ dt slay
[sage@limbo305-3 correctness]$ dt slay
...
[sage@limbo305-3 correctness]$ /home/sage/sage/bin/configure
,,,
10.36.3.51:3116 (slot 17)
10.36.3.51:3120 (slot 18)
10.36.3.51:3124 (slot 19)
Start to run the program with 72 VPCs.
Program execution completed.
protocol error: filename does not match request
protocol error: filename does not match request
protocol error: filename does not match request
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
====== configure is done ====== [0810-11:17:45]
[sage@limbo305-3 correctness]$ echo $?
0
[sage@limbo305-3 correctness]$
sage start reports irresponsive
Code:
[sage@limbo305-3 correctness]$ ~/sage/bin/sage start
Sage is irresponsive. Trying to stop it before start it again.
cat: /home/sage/sage/run/aide.pid: No such file or directory
The prior service instance is perhaps 267779
Sage is stopped.
Starting Sage ..............
log shows
Code:
10.36.1.49 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
10.36.2.50 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
10.36.3.51 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
Sage is stopped.
sage is to run nohup bash -c 'cd /home/sage/sage/bin; ./start_sage 2>&1 | tee -a /home/sage/sage/stdout/sage.out'
stty: 'standard input': Inappropriate ioctl for device
store sage_service pid 271612
start_sage: Tue Aug 10 11:21:10 CST 2021: Sage starts
detect_listen.sh: no process found
[Tue Aug 10 11:21:13 CST 2021] status sage 10.36.1.49 7
sage state:Sage is stopped or irresponsive. vs. started state:Sage is started.
[Tue Aug 10 11:21:20 CST 2021] status sage 10.36.1.49 7
sage state:Sage is stopped or irresponsive. vs. started state:Sage is started.
RE: BIBO on limbo305 - lingu - 08-13-2021 09:30 AM
(08-08-2021 12:38 PM)zhihao Wrote: I start bibo on limbo305-1 and start sage on limbo305-3, then I test delete but failed
Code:
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_delete_zeng.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
[root@limbo305-1 test]#
we should use the standardized way -- 'make test'.
i added a cases var so that we can specify which test to run.
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
run test ./test_delete_zeng.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
Quote:then I update toneroot of procone.py to /thinker/globe/soft/bibo/procuratorate/cases , delete test reports can't open file
should update the module if you do it on limbo305. file copying is okay on wp289 but it is not recommended generally.
RE: BIBO on limbo305 - lingu - 08-13-2021 09:37 AM
insert is not very stable.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 416
insert failed
make: *** [Makefile:22: test] Error 93
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$
zhihao pls investigate this issue.
RE: BIBO on limbo305 - lingu - 08-13-2021 09:39 AM
delete fails
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
run test ./test_delete_zeng.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[sage@limbo305-1 test]$
and sage crashes.
Code:
10.36.3.51:3112 (slot 16)
10.36.3.51:3116 (slot 17)
10.36.3.51:3120 (slot 18)
10.36.3.51:3124 (slot 19)
10.36.3.51:3128 (slot 20)
Start to run the program with 82 VPCs.
Greppy service is running, You can use client tools for searching now
ERROR: VPC reports ABORT_ERROR
ERROR: VPC reports ABORT_ERROR
ERROR: VPC 0x24023204 (10.36.2.50:3064) reports ABORT_ERROR:
Panic: addr out of bound.
ERROR: VPC 0x2402320e (10.36.2.50:3104) reports ABORT_ERROR:
Panic: addr out of bound.
[1628815250.887261s]
ERROR: VPC 0x2402320e (10.36.2.50:3104) reports ABORT_ERROR:
Panic: addr out of bound.
[1628815250.887309s] ERROR: VPC reports ABORT_ERROR
ERROR: VPC reports ABORT_ERROR
ERROR: VPC 0x24023208 (10.36.2.50:3080) reports ABORT_ERROR:
Panic: addr out of bound.
[1628815250.929026s]
ERROR: VPC 0x24023208 (10.36.2.50:3080) reports ABORT_ERROR:
Panic: addr out of bound.
[1628815250.929104s] ERROR: VPC reports ABORT_ERROR
ERROR: VPC reports ABORT_ERROR
ERROR: VPC 0x2402320d (10.36.2.50:3100) reports ABORT_ERROR:
Panic: addr out of bound.
[1628815250.935846s]
ERROR: VPC 0x2402320d (10.36.2.50:3100) reports ABORT_ERROR:
Panic: addr out of bound.
[1628815250.935908s] ERROR: VPC reports ABORT_ERROR
ERROR: VPC reports ABORT_ERROR
ERROR: VPC 0x24023201 (10.36.2.50:3052) reports ABORT_ERROR:
Panic: addr out of bound.
i changed sshd_config, rebooted the vm and mounted yfs manually. then sage does not crash but delete still fails.
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
run test ./test_delete_zeng.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[sage@limbo305-1 test]$
atpa log shows
Code:
[root@limbo305-2 sage]# pwd
/thinker/local/today/users/sage
[root@limbo305-2 sage]# tail atpa.log-10
2021.08.13 9:0:1 (ffffffa2): atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0.. 0x0 0xa 0x1
2021.08.13 9:0:1 (ffffffa2): atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0/10.. 0x0 0xa 0x1
2021.08.13 9:0:1 (ffffffa2): atpa::runx() kissing : 10
2021.08.13 9:0:1 (ffffffa2): atpd::KissFromSage: k i len, st_size: 10 6693
2021.08.13 9:0:1 (ffffffa2): atpd::KissFromSage: Get ret : 10 1
2021.08.13 9:0:1 (ffffffa2): KissFromSage: varda5 format, len..- {"typ.. 0x1 0x1a25 0x0
2021.08.13 9:0:1 (ffffffa2): creating dir /thinker/fastdata/bibo/e/res2//0
2021.08.13 9:0:1 (ffffffa2): mkdir ..- /thinker/fastdata/bibo/e/res2//0.. 0xffffffffffffffff 0x0 0x0
2021.08.13 9:0:1 (ffffffa2): creating dir /thinker/fastdata/bibo/e/res2//0/10
2021.08.13 9:0:1 (ffffffa2): mkdir returns - ffffffffffffffff 0 0 0
[root@limbo305-2 sage]#
efs dir is not installed on limbo305-2
Code:
[root@limbo305-2 sage]# dexer 'ls /thinker/bin/ephemeral/'
10.36.1.49: ls /thinker/bin/ephemeral/
bibo
10.36.2.50: ls /thinker/bin/ephemeral/
10.36.3.51: ls /thinker/bin/ephemeral/
bibo
[root@limbo305-2 sage]#
i re-trun startbibo, and the dir is created.
RE: BIBO on limbo305 - zhihao - 08-13-2021 11:30 AM
(08-13-2021 09:37 AM)lingu Wrote: insert is not very stable.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 416
insert failed
make: *** [Makefile:22: test] Error 93
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$
zhihao pls investigate this issue.
ok, but not get wrong return code yet
Code:
[root@limbo305-1 test]# make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]#
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]#
RE: BIBO on limbo305 - lingu - 08-13-2021 11:33 AM
(08-13-2021 11:30 AM)zhihao Wrote: ok, but not get wrong return code yet
you need run a stress test to reproduce such issues.
dont do it now as we are making it correct first. do it later.
a1. reproduce the problem
a1.1 make a stress test for insert
a1.2 run the stress test to reproduce the problem
a2. fix the problem.
RE: BIBO on limbo305 - zhihao - 08-13-2021 11:42 AM
make test failed
Code:
[root@limbo305-1 test]# make test cases=delete_testcase
Testing delete_testcase
run test ./test_delete_testcase.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_test_case.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[root@limbo305-1 test]#
find utilib error, because it can't find file "/thinker/etc/soft/sites/limbo305-tc/bibo.pcf"
Code:
[root@limbo305-2 ~]# su sage
[sage@limbo305-2 root]$ /thinker/local/soft/bibo/plug/procone2.py --qid 0 --tycano 19
Traceback (most recent call last):
File "/thinker/local/soft/bibo/plug/procone2.py", line 23, in <module>
rb, gmic = worksite.learnSiteMic(modname="bibo")
File "/thinker/local/forest/util/utilib/worksite.py", line 88, in learnSiteMic
print >> sys.stderr, "Unable to locate configuration file " + micpfn
NameError: global name 'micpfn' is not defined
[sage@limbo305-2 root]$
RE: BIBO on limbo305 - lingu - 08-13-2021 11:43 AM
(08-13-2021 09:39 AM)lingu Wrote: i re-trun startbibo, and the dir is created.
another problem
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
run test ./test_delete_zeng.sh
sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
./test_delete_zeng.sh: line 22: /thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt: Permission denied
rm: remove write-protected regular empty file '/thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt'? ^Cmake: *** [Makefile:22: test] Interrupt
[sage@limbo305-1 test]$ ll /thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt
ls: cannot access '/thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt': No such file or directory
[sage@limbo305-1 test]$
but it doesnt happen any more.
RE: BIBO on limbo305 - zhihao - 08-13-2021 12:33 PM
I can't find log of procone2.py for case like 27 in limbo305
Code:
[root@limbo305-1 ~]# grep -nir "opmode" /thinker/bin/ephemeral/
[root@limbo305-1 ~]# grep -nir "opmode" /thinker/local/today/users | tail
/thinker/local/today/users/sage/atpash.log-15:4: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:25: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:46: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:69: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:90: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:111: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:132: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:153: opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:181: opmode: dedicate
/thinker/local/today/users/root/procone2-19.log:3: opmode: dedicate
[root@limbo305-1 ~]#
[root@limbo305-1 ~]# dexer 'ls -al /thinker/local/today/users/sage/| grep atpash'
10.36.1.49: ls -al /thinker/local/today/users/sage/| grep atpash
-rw-rw-rw- 1 sage sage 7959 Aug 13 11:19 atpash.log
-rwxrwxrwx 1 sage sage 10403 Aug 13 11:19 atpash.log-12
-rwxrwxrwx 1 sage sage 10403 Aug 13 11:19 atpash.log-15
-rwxrwxrwx 1 sage sage 10403 Aug 13 11:19 atpash.log-18
-rw------- 1 root root 12288 Aug 13 11:28 .atpash.log-21.swp
-rwxrwxrwx 1 sage sage 3351 Aug 13 11:19 atpash.log-24
-rwxrwxrwx 1 sage sage 18067 Aug 13 11:19 atpash.log-3
-rwxrwxrwx 1 sage sage 18067 Aug 13 11:19 atpash.log-6
-rwxrwxrwx 1 sage sage 15491 Aug 13 11:19 atpash.log-9
10.36.2.50: ls -al /thinker/local/today/users/sage/| grep atpash
-rw-rw-rw- 1 sage sage 6565 Aug 13 11:19 atpash.log
-rwxrwxrwx 1 sage sage 7637 Aug 13 11:19 atpash.log-10
-rwxrwxrwx 1 sage sage 7430 Aug 13 11:19 atpash.log-13
-rwxrwxrwx 1 sage sage 7430 Aug 13 11:19 atpash.log-16
-rwxrwxrwx 1 sage sage 8668 Aug 13 11:19 atpash.log-19
-rwxrwxrwx 1 sage sage 3979 Aug 13 11:19 atpash.log-25
-rwxrwxrwx 1 sage sage 7591 Aug 13 11:19 atpash.log-4
-rwxrwxrwx 1 sage sage 7591 Aug 13 11:19 atpash.log-7
10.36.3.51: ls -al /thinker/local/today/users/sage/| grep atpash
-rw-rw-rw- 1 sage sage 19862 Aug 13 11:19 atpash.log
-rwxrwxrwx 1 sage sage 4454 Aug 13 11:19 atpash.log-11
-rwxrwxrwx 1 sage sage 4454 Aug 13 11:19 atpash.log-14
-rwxrwxrwx 1 sage sage 513 May 21 18:51 atpash.log-15
-rwxrwxrwx 1 sage sage 4454 Aug 13 11:19 atpash.log-17
-rwxrwxrwx 1 sage sage 3128 May 22 19:24 atpash.log-18
-rwxrwxrwx 1 sage sage 3128 May 22 19:24 atpash.log-21
-rwxrwxrwx 1 sage sage 1488 Aug 13 11:19 atpash.log-26
-rwxrwxrwx 1 sage sage 25746 Aug 8 18:35 atpash.log-32
-rwxrwxrwx 1 sage sage 24491 Aug 8 18:35 atpash.log-35
-rwxrwxrwx 1 sage sage 21974 Aug 8 18:35 atpash.log-38
-rwxrwxrwx 1 sage sage 21974 Aug 8 18:35 atpash.log-41
-rwxrwxrwx 1 sage sage 7306 Aug 8 18:35 atpash.log-44
-rwxrwxrwx 1 sage sage 6264 Aug 8 18:35 atpash.log-47
-rwxrwxrwx 1 sage sage 7325 Aug 13 11:19 atpash.log-5
-rwxrwxrwx 1 sage sage 5222 Aug 8 18:35 atpash.log-50
-rwxrwxrwx 1 sage sage 4176 Aug 8 18:35 atpash.log-53
-rwxrwxrwx 1 sage sage 2084 Aug 8 18:35 atpash.log-56
-rwxrwxrwx 1 sage sage 2084 Aug 8 18:35 atpash.log-59
-rwxrwxrwx 1 sage sage 1042 Aug 8 18:35 atpash.log-62
[root@limbo305-1 ~]#
RE: BIBO on limbo305 - lingu - 08-13-2021 01:15 PM
(08-13-2021 11:42 AM)zhihao Wrote: find utilib error, because it can't find file "/thinker/etc/soft/sites/limbo305-tc/bibo.pcf"
Code:
[root@limbo305-2 ~]# su sage
[sage@limbo305-2 root]$ /thinker/local/soft/bibo/plug/procone2.py --qid 0 --tycano 19
Traceback (most recent call last):
File "/thinker/local/soft/bibo/plug/procone2.py", line 23, in <module>
rb, gmic = worksite.learnSiteMic(modname="bibo")
File "/thinker/local/forest/util/utilib/worksite.py", line 88, in learnSiteMic
print >> sys.stderr, "Unable to locate configuration file " + micpfn
NameError: global name 'micpfn' is not defined
[sage@limbo305-2 root]$
that's strange. i believe i installed bibo on limbo305-2.
the file maybe should be /thinker/etc/soft/sites/limbo305-tc/bibo/mic.pcf
but that error should not kill the process.
utilib is perhaps old on limbo305. i reinstalled utilib to update it on all 3 nodes.
Code:
[root@limbo305-2 root]# dexer ' /thinker/local/forest/util/utilib/installx utilib'
10.36.1.49: /thinker/local/forest/util/utilib/installx utilib
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/utilib.tar.gz ... success
10.36.2.50: /thinker/local/forest/util/utilib/installx utilib
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/utilib.tar.gz ... success
10.36.3.51: /thinker/local/forest/util/utilib/installx utilib
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/utilib.tar.gz ... success
[root@limbo305-2 root]#
thanks for spotting the problem, but pls work on wp289 so that we can make progress on both systems.
RE: BIBO on limbo305 - zhihao - 08-13-2021 01:22 PM
update bibo for limbo305
Code:
[root@limbo305-1 ~]# dexer "/thinker/local/forest/util/utilib/installx bibo"
10.36.1.49: /thinker/local/forest/util/utilib/installx bibo
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/bibo.tar.gz ... fail
installx is unable to install bibo [0813-12:20:36]
10.36.2.50: /thinker/local/forest/util/utilib/installx bibo
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/bibo.tar.gz ... fail
installx is unable to install bibo [0813-12:20:58]
10.36.3.51: /thinker/local/forest/util/utilib/installx bibo
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/bibo.tar.gz ... fail
installx is unable to install bibo [0813-12:21:20]
[root@limbo305-1 ~]#
RE: BIBO on limbo305 - lingu - 08-13-2021 03:18 PM
bibo seems to be flapping.
Code:
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
session returns 0
recycling at reqno 7
simpleserve v2 for reqno 0 [0813-14:17:03]
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:07]
The reason was probably another simpleserv running. i killed all simpleserv and ncat then start bibo again. There is no more flapping.
zhihao - pls design to solve this issue. If the port is occupied, we'd better report the accurate error.
RE: BIBO on limbo305 - lingu - 08-13-2021 04:07 PM
Strangely, atpa always termiante after creating 10.seen in task 10.
Code:
2021.08.13 14:59:52 (0f): creating dir /thinker/fastdata/bibo/e/res2//0/10 │·············
2021.08.13 14:59:52 (0f): mkdir returns - ffffffffffffffff 0 0 0 │·············
2021.08.13 14:59:52 (0f): atpa.runx kissed back - 1 0 0 a │·············
2021.08.13 14:59:52 (0f): atpa.runx runs ..- /bin/sh /thinker/globe/.think/run/atpa.sh 0 10.. 0x0 0xa 0x1 │·············
2021.08.13 14:59:54 (0f): atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0/10.seen.. 0x0 0xa 0x1 │·············
--- That's because of static logging not output to atpa.log-10
i am adding logging info to http://rar.shufangkeji.com:60380/showthread.php?tid=9227
RE: BIBO on limbo305 - zhihao - 08-13-2021 04:16 PM
(08-13-2021 03:18 PM)lingu Wrote: bibo seems to be flapping.
Code:
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
session returns 0
recycling at reqno 7
simpleserve v2 for reqno 0 [0813-14:17:03]
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
session returns 0
listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:07]
The reason was probably another simpleserv running. i killed all simpleserv and ncat then start bibo again. There is no more flapping.
zhihao - pls design to solve this issue. If the port is occupied, we'd better report the accurate error.
OK, RR in BIBO (D)
RE: BIBO on limbo305 - lingu - 08-13-2021 06:15 PM
(08-13-2021 12:33 PM)zhihao Wrote: I can't find log of procone2.py for case like 27 in limbo305
i cannot find it, either. It should be in the normal log location /thinker/local/today/users/$USER
i am debugging it.
--- i find it is in atpash.log
Code:
[sage@limbo305-2 sage]$ /bin/sh /thinker/globe/.think/run/atpa.sh 32 10
blologing.....
using log_pfn from envar None
pfn /thinker/local/today/users/sage/atpash.log-10
So it is expected behavior. i will update the log info in rar9227
RE: BIBO on limbo305 - lingu - 08-16-2021 11:19 AM
Testing on limbo305.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
run test ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$
Then I ran sage_restart on gm305-3, then test inquire_zeng.
Code:
[sage@limbo305-1 test]$ make test cases=inquire_zeng
Testing inquire_zeng
run test ./test_inquire_zeng.sh
sending inquire request ./processed/zengxingliang-inquire.json to localhost:62818
{
"code": 404,
"msg": "not found",
"children": [
]
}
RE: BIBO on limbo305 - lingu - 08-16-2021 05:53 PM
It seems malloc() has an issue and the kissup function does not work in Filer.read()
When we read to ccc which is a local array, it is fine.
Code:
llRead = fread(ccc/*pDest*/, 1, Bize(), pfiMain_);
But pDesk, which is malloc'ed, hangs.
--- i am not able to pinpoint the problem. It is quite cryptic and i file a bug -- http://tab.d-thinker.org/showthread.php?tid=5734&pid=134800#pid134800
i work around the problem by adding a 1MB margin.
RE: BIBO on limbo305 - zhihao - 08-18-2021 05:38 PM
try download a file, but it doesn't downloaded, and then list can't work
Code:
[sage@limbo305-1 root]$ /thinker/globe/.think/run/auntie down 11 ./
[sage@limbo305-1 root]$
[sage@limbo305-1 ~]$ /thinker/globe/.think/run/auntie list
[sage@limbo305-1 ~]$
log shows:
Code:
auntie ::...- /thinker/globe/.think/run/auntie... 0x4 0x0 0x0
2021.08.18 16:33:40 (67): sage_user sage
2021.08.18 16:33:40 (67): arguments starts..- down.. 0x1 0x0 0x0
2021.08.18 16:33:40 (67): auntie arg:..- down.. 0x4 0x1 0x1
2021.08.18 16:33:40 (67): vb - 21 0 0 0
2021.08.18 16:33:40 (67): verb - 21 20 63 1
2021.08.18 16:33:40 (67): act_down: ..- 11.. 0x0 0x0 0x0
2021.08.18 16:33:40 (67): auntie prelude()
2021.08.18 16:33:40 (67): ERROR: auntie - unable to open down channel /thinker/local/sage/atpddown
2021.08.18 16:33:40 (67): auntie: quit - channels not open - 61 0 0 0
2021.08.18 16:34:0 (67):
auntie ::...- /thinker/globe/.think/run/auntie... 0x2 0x0 0x0
2021.08.18 16:34:0 (67): sage_user sage
2021.08.18 16:34:0 (67): arguments starts..- list.. 0x1 0x0 0x0
2021.08.18 16:34:0 (67): auntie arg:..- list.. 0x2 0x1 0x1
2021.08.18 16:34:0 (67): vb - 25 0 0 0
2021.08.18 16:34:0 (67): verb - 25 20 63 1
2021.08.18 16:34:0 (67): act_list enter - 6e 0 0 0
2021.08.18 16:34:0 (67): to list ..- .. 0x0 0x0 0x0
2021.08.18 16:34:0 (67): auntie prelude()
2021.08.18 16:34:0 (67): ERROR: auntie - unable to open down channel /thinker/local/sage/atpddown
2021.08.18 16:34:0 (67): auntie: quit - channels not open - 61 0 0 0
2021.08.18 16:34:13 (67):
auntie ::...- /thinker/globe/.think/run/auntie... 0x2 0x0 0x0
2021.08.18 16:34:13 (67): sage_user sage
2021.08.18 16:34:13 (67): arguments starts..- list.. 0x1 0x0 0x0
2021.08.18 16:34:13 (67): auntie arg:..- list.. 0x2 0x1 0x1
2021.08.18 16:34:13 (67): vb - 25 0 0 0
2021.08.18 16:34:13 (67): verb - 25 20 63 1
2021.08.18 16:34:13 (67): act_list enter - 6e 0 0 0
2021.08.18 16:34:13 (67): to list ..- .. 0x0 0x0 0x0
2021.08.18 16:34:13 (67): auntie prelude()
2021.08.18 16:34:13 (67): ERROR: auntie - unable to open down channel /thinker/local/sage/atpddown
2021.08.18 16:34:13 (67): auntie: quit - channels not open - 61 0 0 0
(END)
find no program atpddown
Code:
[sage@limbo305-1 root]$ ls /thinker/local/sage/
auntie_letter_test auntie_list_test pcf236893299.sh pcf799866753.sh sgl.log-2043 xfer.log xfer.log--1 xfer.log-2043
[sage@limbo305-1 root]$
[root@limbo305-1 ~]#
|