Rigorous and Reliable (RAR)
BIBO on limbo305 - Printable Version

+- Rigorous and Reliable (RAR) (http://rar.shufangkeji.com:60380)
+-- Forum: 大计算 (/forumdisplay.php?fid=257)
+--- Forum: 应用 (/forumdisplay.php?fid=363)
+---- Forum: BIBO (/forumdisplay.php?fid=602)
+---- Thread: BIBO on limbo305 (/showthread.php?tid=9620)


BIBO on limbo305 - zhihao - 08-08-2021 12:35 PM

BIBO is deployed on limbo305


RE: BIBO on limbo305 - zhihao - 08-08-2021 12:38 PM

I start bibo on limbo305-1 and start sage on limbo305-3, then I test delete but failed
Code:
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
[root@limbo305-1 test]#

log reports
Code:
[root@limbo305-1 sage]# less procone-49.log
[2021-08-08 13:50:33] procone starts
  tycano: 49
  more args
  keys in /thinker/globe/soft/bibo/procuratorate/tests/typical/49/insert.json
There is error when opening file /thinker/globe/soft/bibo/procuratorate/tests/typical/49/insert.json
procone-49.log (END)

then I update toneroot of procone.py to /thinker/globe/soft/bibo/procuratorate/cases , delete test reports can't open file
Code:
[2021-08-08 14:11:31] procone starts
  tycano: 51
  more args
  keys in /thinker/globe/soft/bibo/procuratorate/cases/typical/51/content
    keys ready
    tyca json loaded.
    opening intermediate file /thinker/globe/soft/bibo/procuratorate/cases/res2/res_51.txt-tmp

find no res2 dir, I manually create it then case 52 can be deleted
Code:
sage@limbo305-1 test]$ /thinker/local/soft/bibo/plug/procone.py --tycano 52 --reqtype delete --reqcaseid "67be261c9e6311eabceb005056c00001"
[sage@limbo305-1 test]$ /
[code]
[root@limbo305-1 test]# cat /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json; sleep 18 | ncat localhost 62818
{"type":"delete","caseid":"67be261c9e6311eabceb005056c00001"
}

[root@limbo305-1 test]# echo $?
0

later I add below in procone.py
Code:
toneroot = "/thinker/globe/soft/bibo/procuratorate/cases"
bibofast = "/thinker/fastdata/bibo"
bibe = bibofast + "/e"
reslocal = bibe + "/res2"



RE: BIBO on limbo305 - zhihao - 08-08-2021 07:56 PM

inquire will stuck because yfs has some inactive pg
Code:
[root@limbo305-1 test]# ./test_inquire_zeng.sh
sending inquire request ./processed/zengxingliang-inquire.json to localhost:62818
^C
[root@limbo305-1 test]# ls /thinker/globe/soft/bibo/procuratorate/cases/sum2/
^C
[root@limbo305-1 test]#
[root@limbo305-1 test]# ceph -s
  cluster:
    id:     33d940a8-7e68-44f3-bc37-305aaaabbbbc
    health: HEALTH_ERR
            1 clients failing to respond to capability release
            1 MDSs report slow metadata IOs
            1 MDSs report slow requests
            mons limbo305-1,limbo305-2,limbo305-3 are low on available space
            1 monitors have not enabled msgr2
            2/510 objects unfound (0.392%)
            Reduced data availability: 36 pgs inactive, 33 pgs incomplete
            Possible data damage: 1 pg recovery_unfound
            Degraded data redundancy: 6/1530 objects degraded (0.392%), 1 pg degraded
            32 pgs not deep-scrubbed in time
            32 pgs not scrubbed in time
            195 slow ops, oldest one blocked for 81807 sec, daemons [osd.5,osd.7] have slow ops.
  
  services:
    mon: 3 daemons, quorum limbo305-1,limbo305-2,limbo305-3 (age 23h)
    mgr: limbo305-1(active, since 32m)
    mds: cephfs:1 {0=limbo305-3=up:active} 1 up:standby
    osd: 12 osds: 12 up (since 22h), 12 in (since 22h)
  
  task status:
    scrub status:
        mds.limbo305-3: idle
  
  data:
    pools:   4 pools, 97 pgs
    objects: 510 objects, 182 MiB
    usage:   13 GiB used, 1.1 TiB / 1.1 TiB avail
    pgs:     3.093% pgs unknown
             34.021% pgs not active
             6/1530 objects degraded (0.392%)
             2/510 objects unfound (0.392%)
             60 active+clean
             32 creating+incomplete
             3  unknown
             1  active+recovery_unfound+degraded
             1  incomplete

3 unknown pgs are in cephfs_metadata, then I reinstall yotta on limbo305-1


RE: BIBO on limbo305 - zhihao - 08-10-2021 12:50 AM

install sage stuck
Code:
[root@limbo305-3 ~]# decent_init=True /thinker/local/forest/util/utilib/installx sage
/thinker/local/shed/installation/limbo305-tc                                        
Installing /thinker/local/shed/installation/limbo305-tc/sage.tar.gz ...

log shows some files not found
Code:
...
10.36.3.51:3124 (slot 19)
Start to run the program with 72 VPCs.
Program execution completed.
protocol error: filename does not match request
protocol error: filename does not match request
protocol error: filename does not match request
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
...
=== Begin Sage Service Test Set ===
./sage_service/sage-service-start.sh
cat: /home/sage/sage/run/service.pid: No such file or directory
cat: /home/sage/sage/run/aide.pid: No such file or directory
Sage is stopped.
          lockstep mark /thinker/globe/.think/lockstep//limbo305-3/sage/tested
          lockstep mark /thinker/globe/.think/lockstep//limbo305-3/sage/decent/workdone

sage start reports need to configure first.
Code:
[sage@limbo305-3 ~]$ ./sage/bin/sage start
ERROR: Sage has not been configured. Please run /home/sage/sage/bin/configure at sage_portal first.
[sage@limbo305-3 ~]$
[sage@limbo305-3 correctness]$ /home/sage/sage/bin/configure
  
====== Generating config.pcf ======
    inferred and exported:
      sage_base: /home/sage/sage
      sage_stdout: /home/sage/sage/stdout/sage.out
      sage_debug: 0
      helper_portal: 10.36.1.49
      sage_ipx: /thinker/etc/ips.cfg
      sage_atp_cnt: 60
  generated /home/sage/sage/config.pcf

====== Generating config.node, sage-svc.inc & hosts.ips ======

-- generating config.node & sage-svc.inc by gen_sage_config

10.36.1.49
10.36.2.50
10.36.3.51
    HELPER_CNT: 6
    HEAVEN_CNT: 1
    ATP_CNT: 60
    backup old config.node: /home/sage/sage/config.node.105847
    backup old sage-svc.inc: /home/sage/sage/sage-svc.inc.105847

Generating walkers from /thinker/etc/ips.cfg
New node: 10.36.1.49
New node: 10.36.2.50
New node: 10.36.3.51
Add fillers
10.36.1.49:1
10.36.1.49:51
10.36.1.49:2
10.36.1.49:51
10.36.1.49:3
10.36.1.49:51
10.36.1.49:4
...
10.36.1.49:51
Adding sage_atp_staff_space_cnt (3), sage_atp_staff_space_start (20) & sage_atp_head_space (14) to config.pcf

Arguments (Part 2):
    WALKER_CNT: 3
    FILLER_CNT: -1
    sage_atp_staff_space_start: 20
    sage_atp_staff_space_cnt: 3
    sage_atp_head_space: 14
  
    config.node is generated in /home/sage/sage/config.node successfully.
    The old config.node is backuped in /home/sage/sage/config.node.105847.
    sage-svc.inc is generated in /home/sage/sage/sage-svc.inc successfully.
    The old svc_inc is backuped in /home/sage/sage/sage-svc.inc.105847.

-- generating hosts.ips from config.node

====== Begin to do the original load.sh ======

Parsing config.node
Configuring for 72 containers
  10.36.3.51 slots: 52
  10.36.1.49 slots: 54 55 56 57 58 59 60 52
  10.36.2.50 slots: 52
  10.36.3.51 slots: 53
  10.36.1.49 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  10.36.2.50 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  10.36.3.51 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  10.36.1.49 slots: 51 50Obtaining config info
Generating config
Sage is going to format the persistent memory of DT after 10 seconds. Please type Ctrl-C if you do not want to do it.

WARNING: Seems the thinker is running. Refused to format the persistent memory.

This may happen because:
  a) Someone else is using the thinker.
  b) Your program fails or you canceled you program by 'Ctrl-C'.

If you are the one who reserve the thinker for the current time range,
or you are sure that the thinker is running your program, you can
kill the thinker by:

$ dt slay

[sage@limbo305-3 correctness]$ dt slay
...
[sage@limbo305-3 correctness]$ /home/sage/sage/bin/configure
,,,
10.36.3.51:3116 (slot 17)
10.36.3.51:3120 (slot 18)
10.36.3.51:3124 (slot 19)
Start to run the program with 72 VPCs.
Program execution completed.
protocol error: filename does not match request
protocol error: filename does not match request
protocol error: filename does not match request
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory

====== configure is done ====== [0810-11:17:45]
[sage@limbo305-3 correctness]$ echo $?
0
[sage@limbo305-3 correctness]$

sage start reports irresponsive
Code:
[sage@limbo305-3 correctness]$ ~/sage/bin/sage start
Sage is irresponsive. Trying to stop it before start it again.
cat: /home/sage/sage/run/aide.pid: No such file or directory
The prior service instance is perhaps 267779
Sage is stopped.

Starting Sage ..............

log shows
Code:
10.36.1.49 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
10.36.2.50 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
10.36.3.51 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":


Sage is stopped.
    sage is to run nohup bash -c 'cd /home/sage/sage/bin; ./start_sage 2>&1 | tee -a /home/sage/sage/stdout/sage.out'
stty: 'standard input': Inappropriate ioctl for device
      store sage_service pid 271612
    start_sage: Tue Aug 10 11:21:10 CST 2021: Sage starts
detect_listen.sh: no process found
[Tue Aug 10 11:21:13 CST 2021] status sage 10.36.1.49 7
        sage state:Sage is stopped or irresponsive. vs. started state:Sage is started.
[Tue Aug 10 11:21:20 CST 2021] status sage 10.36.1.49 7
        sage state:Sage is stopped or irresponsive. vs. started state:Sage is started.



RE: BIBO on limbo305 - lingu - 08-13-2021 09:30 AM

(08-08-2021 12:38 PM)zhihao Wrote:  I start bibo on limbo305-1 and start sage on limbo305-3, then I test delete but failed
Code:
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
[root@limbo305-1 test]#

we should use the standardized way -- 'make test'.

i added a cases var so that we can specify which test to run.
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818

Quote:then I update toneroot of procone.py to /thinker/globe/soft/bibo/procuratorate/cases , delete test reports can't open file

should update the module if you do it on limbo305. file copying is okay on wp289 but it is not recommended generally.


RE: BIBO on limbo305 - lingu - 08-13-2021 09:37 AM

insert is not very stable.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 416
insert failed
make: *** [Makefile:22: test] Error 93
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$

zhihao pls investigate this issue.


RE: BIBO on limbo305 - lingu - 08-13-2021 09:39 AM

delete fails
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[sage@limbo305-1 test]$

and sage crashes.
Code:
10.36.3.51:3112 (slot 16)
10.36.3.51:3116 (slot 17)
10.36.3.51:3120 (slot 18)
10.36.3.51:3124 (slot 19)
10.36.3.51:3128 (slot 20)
Start to run the program with 82 VPCs.
Greppy service is running, You can use client tools for searching now

ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x24023204 (10.36.2.50:3064) reports ABORT_ERROR:
Panic: addr out of bound.


ERROR: VPC 0x2402320e (10.36.2.50:3104) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.887261s]
ERROR: VPC 0x2402320e (10.36.2.50:3104) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.887309s] ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x24023208 (10.36.2.50:3080) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.929026s]
ERROR: VPC 0x24023208 (10.36.2.50:3080) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.929104s] ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x2402320d (10.36.2.50:3100) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.935846s]
ERROR: VPC 0x2402320d (10.36.2.50:3100) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.935908s] ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x24023201 (10.36.2.50:3052) reports ABORT_ERROR:
Panic: addr out of bound.

i changed sshd_config, rebooted the vm and mounted yfs manually. then sage does not crash but delete still fails.
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[sage@limbo305-1 test]$

atpa log shows
Code:
[root@limbo305-2 sage]# pwd
/thinker/local/today/users/sage
[root@limbo305-2 sage]# tail atpa.log-10
2021.08.13 9:0:1 (ffffffa2):     atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0.. 0x0 0xa 0x1
2021.08.13 9:0:1 (ffffffa2):     atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0/10.. 0x0 0xa 0x1
2021.08.13 9:0:1 (ffffffa2):   atpa::runx() kissing : 10
2021.08.13 9:0:1 (ffffffa2): atpd::KissFromSage: k i len, st_size: 10 6693
2021.08.13 9:0:1 (ffffffa2): atpd::KissFromSage: Get ret : 10 1
2021.08.13 9:0:1 (ffffffa2): KissFromSage: varda5 format, len..- {"typ.. 0x1 0x1a25 0x0
2021.08.13 9:0:1 (ffffffa2):   creating dir /thinker/fastdata/bibo/e/res2//0
2021.08.13 9:0:1 (ffffffa2):   mkdir ..- /thinker/fastdata/bibo/e/res2//0.. 0xffffffffffffffff 0x0 0x0
2021.08.13 9:0:1 (ffffffa2):   creating dir /thinker/fastdata/bibo/e/res2//0/10
2021.08.13 9:0:1 (ffffffa2):     mkdir returns - ffffffffffffffff 0 0 0
[root@limbo305-2 sage]#

efs dir is not installed on limbo305-2
Code:
[root@limbo305-2 sage]# dexer 'ls /thinker/bin/ephemeral/'
10.36.1.49: ls /thinker/bin/ephemeral/
bibo

10.36.2.50: ls /thinker/bin/ephemeral/

10.36.3.51: ls /thinker/bin/ephemeral/
bibo

[root@limbo305-2 sage]#

i re-trun startbibo, and the dir is created.


RE: BIBO on limbo305 - zhihao - 08-13-2021 11:30 AM

(08-13-2021 09:37 AM)lingu Wrote:  insert is not very stable.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 416
insert failed
make: *** [Makefile:22: test] Error 93
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$

zhihao pls investigate this issue.

ok, but not get wrong return code yet
Code:
[root@limbo305-1 test]# make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]#
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]#



RE: BIBO on limbo305 - lingu - 08-13-2021 11:33 AM

(08-13-2021 11:30 AM)zhihao Wrote:  ok, but not get wrong return code yet

you need run a stress test to reproduce such issues.

dont do it now as we are making it correct first. do it later.

a1. reproduce the problem
a1.1 make a stress test for insert
a1.2 run the stress test to reproduce the problem

a2. fix the problem.


RE: BIBO on limbo305 - zhihao - 08-13-2021 11:42 AM

make test failed
Code:
[root@limbo305-1 test]# make test cases=delete_testcase
Testing delete_testcase
    run test  ./test_delete_testcase.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_test_case.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[root@limbo305-1 test]#

find utilib error, because it can't find file "/thinker/etc/soft/sites/limbo305-tc/bibo.pcf"
Code:
[root@limbo305-2 ~]# su sage
[sage@limbo305-2 root]$ /thinker/local/soft/bibo/plug/procone2.py --qid 0 --tycano 19
Traceback (most recent call last):
  File "/thinker/local/soft/bibo/plug/procone2.py", line 23, in <module>
    rb, gmic = worksite.learnSiteMic(modname="bibo")
  File "/thinker/local/forest/util/utilib/worksite.py", line 88, in learnSiteMic
    print >> sys.stderr, "Unable to locate configuration file " + micpfn
NameError: global name 'micpfn' is not defined
[sage@limbo305-2 root]$



RE: BIBO on limbo305 - lingu - 08-13-2021 11:43 AM

(08-13-2021 09:39 AM)lingu Wrote:  i re-trun startbibo, and the dir is created.

another problem
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
./test_delete_zeng.sh: line 22: /thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt: Permission denied
rm: remove write-protected regular empty file '/thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt'? ^Cmake: *** [Makefile:22: test] Interrupt

[sage@limbo305-1 test]$ ll /thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt
ls: cannot access '/thinker/local/soft/bibo/app/test/processed/response-2021-08-13.txt': No such file or directory
[sage@limbo305-1 test]$

but it doesnt happen any more.


RE: BIBO on limbo305 - zhihao - 08-13-2021 12:33 PM

I can't find log of procone2.py for case like 27 in limbo305
Code:
[root@limbo305-1 ~]# grep -nir "opmode"  /thinker/bin/ephemeral/
[root@limbo305-1 ~]# grep -nir "opmode"  /thinker/local/today/users | tail
/thinker/local/today/users/sage/atpash.log-15:4:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:25:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:46:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:69:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:90:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:111:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:132:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:153:  opmode: dedicate
/thinker/local/today/users/sage/atpash.log-15:181:  opmode: dedicate
/thinker/local/today/users/root/procone2-19.log:3:  opmode: dedicate
[root@limbo305-1 ~]#
[root@limbo305-1 ~]# dexer 'ls -al /thinker/local/today/users/sage/| grep atpash'
10.36.1.49: ls -al /thinker/local/today/users/sage/| grep atpash
-rw-rw-rw-  1 sage sage    7959 Aug 13 11:19 atpash.log
-rwxrwxrwx  1 sage sage   10403 Aug 13 11:19 atpash.log-12
-rwxrwxrwx  1 sage sage   10403 Aug 13 11:19 atpash.log-15
-rwxrwxrwx  1 sage sage   10403 Aug 13 11:19 atpash.log-18
-rw-------  1 root root   12288 Aug 13 11:28 .atpash.log-21.swp
-rwxrwxrwx  1 sage sage    3351 Aug 13 11:19 atpash.log-24
-rwxrwxrwx  1 sage sage   18067 Aug 13 11:19 atpash.log-3
-rwxrwxrwx  1 sage sage   18067 Aug 13 11:19 atpash.log-6
-rwxrwxrwx  1 sage sage   15491 Aug 13 11:19 atpash.log-9

10.36.2.50: ls -al /thinker/local/today/users/sage/| grep atpash
-rw-rw-rw-  1 sage sage    6565 Aug 13 11:19 atpash.log
-rwxrwxrwx  1 sage sage    7637 Aug 13 11:19 atpash.log-10
-rwxrwxrwx  1 sage sage    7430 Aug 13 11:19 atpash.log-13
-rwxrwxrwx  1 sage sage    7430 Aug 13 11:19 atpash.log-16
-rwxrwxrwx  1 sage sage    8668 Aug 13 11:19 atpash.log-19
-rwxrwxrwx  1 sage sage    3979 Aug 13 11:19 atpash.log-25
-rwxrwxrwx  1 sage sage    7591 Aug 13 11:19 atpash.log-4
-rwxrwxrwx  1 sage sage    7591 Aug 13 11:19 atpash.log-7

10.36.3.51: ls -al /thinker/local/today/users/sage/| grep atpash
-rw-rw-rw-   1 sage sage     19862 Aug 13 11:19 atpash.log
-rwxrwxrwx   1 sage sage      4454 Aug 13 11:19 atpash.log-11
-rwxrwxrwx   1 sage sage      4454 Aug 13 11:19 atpash.log-14
-rwxrwxrwx   1 sage sage       513 May 21 18:51 atpash.log-15
-rwxrwxrwx   1 sage sage      4454 Aug 13 11:19 atpash.log-17
-rwxrwxrwx   1 sage sage      3128 May 22 19:24 atpash.log-18
-rwxrwxrwx   1 sage sage      3128 May 22 19:24 atpash.log-21
-rwxrwxrwx   1 sage sage      1488 Aug 13 11:19 atpash.log-26
-rwxrwxrwx   1 sage sage     25746 Aug  8 18:35 atpash.log-32
-rwxrwxrwx   1 sage sage     24491 Aug  8 18:35 atpash.log-35
-rwxrwxrwx   1 sage sage     21974 Aug  8 18:35 atpash.log-38
-rwxrwxrwx   1 sage sage     21974 Aug  8 18:35 atpash.log-41
-rwxrwxrwx   1 sage sage      7306 Aug  8 18:35 atpash.log-44
-rwxrwxrwx   1 sage sage      6264 Aug  8 18:35 atpash.log-47
-rwxrwxrwx   1 sage sage      7325 Aug 13 11:19 atpash.log-5
-rwxrwxrwx   1 sage sage      5222 Aug  8 18:35 atpash.log-50
-rwxrwxrwx   1 sage sage      4176 Aug  8 18:35 atpash.log-53
-rwxrwxrwx   1 sage sage      2084 Aug  8 18:35 atpash.log-56
-rwxrwxrwx   1 sage sage      2084 Aug  8 18:35 atpash.log-59
-rwxrwxrwx   1 sage sage      1042 Aug  8 18:35 atpash.log-62

[root@limbo305-1 ~]#



RE: BIBO on limbo305 - lingu - 08-13-2021 01:15 PM

(08-13-2021 11:42 AM)zhihao Wrote:  find utilib error, because it can't find file "/thinker/etc/soft/sites/limbo305-tc/bibo.pcf"
Code:
[root@limbo305-2 ~]# su sage
[sage@limbo305-2 root]$ /thinker/local/soft/bibo/plug/procone2.py --qid 0 --tycano 19
Traceback (most recent call last):
  File "/thinker/local/soft/bibo/plug/procone2.py", line 23, in <module>
    rb, gmic = worksite.learnSiteMic(modname="bibo")
  File "/thinker/local/forest/util/utilib/worksite.py", line 88, in learnSiteMic
    print >> sys.stderr, "Unable to locate configuration file " + micpfn
NameError: global name 'micpfn' is not defined
[sage@limbo305-2 root]$

that's strange. i believe i installed bibo on limbo305-2.

the file maybe should be /thinker/etc/soft/sites/limbo305-tc/bibo/mic.pcf

but that error should not kill the process.

utilib is perhaps old on limbo305. i reinstalled utilib to update it on all 3 nodes.
Code:
[root@limbo305-2 root]# dexer ' /thinker/local/forest/util/utilib/installx utilib'
10.36.1.49:  /thinker/local/forest/util/utilib/installx utilib
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/utilib.tar.gz ... success

10.36.2.50:  /thinker/local/forest/util/utilib/installx utilib
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/utilib.tar.gz ... success

10.36.3.51:  /thinker/local/forest/util/utilib/installx utilib
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/utilib.tar.gz ... success

[root@limbo305-2 root]#

thanks for spotting the problem, but pls work on wp289 so that we can make progress on both systems.


RE: BIBO on limbo305 - zhihao - 08-13-2021 01:22 PM

update bibo for limbo305
Code:
[root@limbo305-1 ~]# dexer "/thinker/local/forest/util/utilib/installx bibo"
10.36.1.49: /thinker/local/forest/util/utilib/installx bibo
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/bibo.tar.gz ... fail
  installx is unable to install bibo [0813-12:20:36]

10.36.2.50: /thinker/local/forest/util/utilib/installx bibo
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/bibo.tar.gz ... fail
  installx is unable to install bibo [0813-12:20:58]

10.36.3.51: /thinker/local/forest/util/utilib/installx bibo
/thinker/globe/udata/root/Limbo/Packages/limbo305-tc
Installing /thinker/globe/udata/root/Limbo/Packages/limbo305-tc/bibo.tar.gz ... fail
  installx is unable to install bibo [0813-12:21:20]

[root@limbo305-1 ~]#



RE: BIBO on limbo305 - lingu - 08-13-2021 03:18 PM

bibo seems to be flapping.
Code:
session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
    session returns 0
  recycling at reqno 7
  simpleserve v2 for reqno 0 [0813-14:17:03]
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:07]

The reason was probably another simpleserv running. i killed all simpleserv and ncat then start bibo again. There is no more flapping.

zhihao - pls design to solve this issue. If the port is occupied, we'd better report the accurate error.


RE: BIBO on limbo305 - lingu - 08-13-2021 04:07 PM

Strangely, atpa always termiante after creating 10.seen in task 10.
Code:
2021.08.13 14:59:52 (0f):   creating dir /thinker/fastdata/bibo/e/res2//0/10                                                          │·············
2021.08.13 14:59:52 (0f):     mkdir returns - ffffffffffffffff 0 0 0                                                                  │·············
2021.08.13 14:59:52 (0f):         atpa.runx kissed back  - 1 0 0 a                                                                    │·············
2021.08.13 14:59:52 (0f):       atpa.runx runs ..- /bin/sh /thinker/globe/.think/run/atpa.sh 0 10.. 0x0 0xa 0x1                       │·············
2021.08.13 14:59:54 (0f):       atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0/10.seen.. 0x0 0xa 0x1                         │·············

--- That's because of static logging not output to atpa.log-10

i am adding logging info to http://rar.shufangkeji.com:60380/showthread.php?tid=9227


RE: BIBO on limbo305 - zhihao - 08-13-2021 04:16 PM

(08-13-2021 03:18 PM)lingu Wrote:  bibo seems to be flapping.
Code:
session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:03]
    session returns 0
  recycling at reqno 7
  simpleserve v2 for reqno 0 [0813-14:17:03]
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:06]
    session returns 0
      listen on port 62818 with /thinker/local/soft/bibo/plug/talk2thinker2.sh [0813-14:17:07]

The reason was probably another simpleserv running. i killed all simpleserv and ncat then start bibo again. There is no more flapping.

zhihao - pls design to solve this issue. If the port is occupied, we'd better report the accurate error.

OK, RR in BIBO (D)


RE: BIBO on limbo305 - lingu - 08-13-2021 06:15 PM

(08-13-2021 12:33 PM)zhihao Wrote:  I can't find log of procone2.py for case like 27 in limbo305

i cannot find it, either. It should be in the normal log location /thinker/local/today/users/$USER

i am debugging it.

--- i find it is in atpash.log
Code:
[sage@limbo305-2 sage]$ /bin/sh /thinker/globe/.think/run/atpa.sh 32 10
blologing.....
using log_pfn from envar None
pfn /thinker/local/today/users/sage/atpash.log-10

So it is expected behavior. i will update the log info in rar9227


RE: BIBO on limbo305 - lingu - 08-16-2021 11:19 AM

Testing on limbo305.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$

Then I ran sage_restart on gm305-3, then test inquire_zeng.
Code:
[sage@limbo305-1 test]$ make test cases=inquire_zeng
Testing inquire_zeng
    run test  ./test_inquire_zeng.sh
sending inquire request ./processed/zengxingliang-inquire.json to localhost:62818
{
"code": 404,
"msg": "not found",
"children": [
]
}



RE: BIBO on limbo305 - lingu - 08-16-2021 05:53 PM

It seems malloc() has an issue and the kissup function does not work in Filer.read()

When we read to ccc which is a local array, it is fine.
Code:
llRead = fread(ccc/*pDest*/, 1, Bize(), pfiMain_);

But pDesk, which is malloc'ed, hangs.

--- i am not able to pinpoint the problem. It is quite cryptic and i file a bug -- http://tab.d-thinker.org/showthread.php?tid=5734&pid=134800#pid134800

i work around the problem by adding a 1MB margin.


RE: BIBO on limbo305 - zhihao - 08-18-2021 05:38 PM

try download a file, but it doesn't downloaded, and then list can't work
Code:
[sage@limbo305-1 root]$ /thinker/globe/.think/run/auntie down 11 ./
[sage@limbo305-1 root]$
[sage@limbo305-1 ~]$ /thinker/globe/.think/run/auntie list
[sage@limbo305-1 ~]$

log shows:
Code:
auntie ::...- /thinker/globe/.think/run/auntie... 0x4 0x0 0x0
2021.08.18 16:33:40 (67):   sage_user sage
2021.08.18 16:33:40 (67):   arguments starts..- down.. 0x1 0x0 0x0
2021.08.18 16:33:40 (67):   auntie arg:..- down.. 0x4 0x1 0x1
2021.08.18 16:33:40 (67):   vb - 21 0 0 0
2021.08.18 16:33:40 (67):     verb - 21 20 63 1
2021.08.18 16:33:40 (67): act_down: ..- 11.. 0x0 0x0 0x0
2021.08.18 16:33:40 (67):         auntie prelude()
2021.08.18 16:33:40 (67): ERROR: auntie - unable to open down channel /thinker/local/sage/atpddown
2021.08.18 16:33:40 (67): auntie: quit - channels not open - 61 0 0 0
2021.08.18 16:34:0 (67):

auntie ::...- /thinker/globe/.think/run/auntie... 0x2 0x0 0x0
2021.08.18 16:34:0 (67):   sage_user sage
2021.08.18 16:34:0 (67):   arguments starts..- list.. 0x1 0x0 0x0
2021.08.18 16:34:0 (67):   auntie arg:..- list.. 0x2 0x1 0x1
2021.08.18 16:34:0 (67):   vb - 25 0 0 0
2021.08.18 16:34:0 (67):     verb - 25 20 63 1
2021.08.18 16:34:0 (67): act_list enter - 6e 0 0 0
2021.08.18 16:34:0 (67):   to list ..- .. 0x0 0x0 0x0
2021.08.18 16:34:0 (67):         auntie prelude()
2021.08.18 16:34:0 (67): ERROR: auntie - unable to open down channel /thinker/local/sage/atpddown
2021.08.18 16:34:0 (67): auntie: quit - channels not open - 61 0 0 0
2021.08.18 16:34:13 (67):

auntie ::...- /thinker/globe/.think/run/auntie... 0x2 0x0 0x0
2021.08.18 16:34:13 (67):   sage_user sage
2021.08.18 16:34:13 (67):   arguments starts..- list.. 0x1 0x0 0x0
2021.08.18 16:34:13 (67):   auntie arg:..- list.. 0x2 0x1 0x1
2021.08.18 16:34:13 (67):   vb - 25 0 0 0
2021.08.18 16:34:13 (67):     verb - 25 20 63 1
2021.08.18 16:34:13 (67): act_list enter - 6e 0 0 0
2021.08.18 16:34:13 (67):   to list ..- .. 0x0 0x0 0x0
2021.08.18 16:34:13 (67):         auntie prelude()
2021.08.18 16:34:13 (67): ERROR: auntie - unable to open down channel /thinker/local/sage/atpddown
2021.08.18 16:34:13 (67): auntie: quit - channels not open - 61 0 0 0
(END)

find no program atpddown
Code:
[sage@limbo305-1 root]$ ls /thinker/local/sage/
auntie_letter_test  auntie_list_test  pcf236893299.sh  pcf799866753.sh  sgl.log-2043  xfer.log  xfer.log--1  xfer.log-2043
[sage@limbo305-1 root]$

[root@limbo305-1 ~]#