Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BIBO on limbo305
08-08-2021, 12:35 PM
Post: #1
BIBO on limbo305
BIBO is deployed on limbo305
Quote this message in a reply
08-08-2021, 12:38 PM
Post: #2
RE: BIBO on limbo305
I start bibo on limbo305-1 and start sage on limbo305-3, then I test delete but failed
Code:
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
[root@limbo305-1 test]#

log reports
Code:
[root@limbo305-1 sage]# less procone-49.log
[2021-08-08 13:50:33] procone starts
  tycano: 49
  more args
  keys in /thinker/globe/soft/bibo/procuratorate/tests/typical/49/insert.json
There is error when opening file /thinker/globe/soft/bibo/procuratorate/tests/typical/49/insert.json
procone-49.log (END)

then I update toneroot of procone.py to /thinker/globe/soft/bibo/procuratorate/cases , delete test reports can't open file
Code:
[2021-08-08 14:11:31] procone starts
  tycano: 51
  more args
  keys in /thinker/globe/soft/bibo/procuratorate/cases/typical/51/content
    keys ready
    tyca json loaded.
    opening intermediate file /thinker/globe/soft/bibo/procuratorate/cases/res2/res_51.txt-tmp

find no res2 dir, I manually create it then case 52 can be deleted
Code:
sage@limbo305-1 test]$ /thinker/local/soft/bibo/plug/procone.py --tycano 52 --reqtype delete --reqcaseid "67be261c9e6311eabceb005056c00001"
[sage@limbo305-1 test]$ /
[code]
[root@limbo305-1 test]# cat /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json; sleep 18 | ncat localhost 62818
{"type":"delete","caseid":"67be261c9e6311eabceb005056c00001"
}

[root@limbo305-1 test]# echo $?
0

later I add below in procone.py
Code:
toneroot = "/thinker/globe/soft/bibo/procuratorate/cases"
bibofast = "/thinker/fastdata/bibo"
bibe = bibofast + "/e"
reslocal = bibe + "/res2"
Quote this message in a reply
08-08-2021, 07:56 PM
Post: #3
RE: BIBO on limbo305
inquire will stuck because yfs has some inactive pg
Code:
[root@limbo305-1 test]# ./test_inquire_zeng.sh
sending inquire request ./processed/zengxingliang-inquire.json to localhost:62818
^C
[root@limbo305-1 test]# ls /thinker/globe/soft/bibo/procuratorate/cases/sum2/
^C
[root@limbo305-1 test]#
[root@limbo305-1 test]# ceph -s
  cluster:
    id:     33d940a8-7e68-44f3-bc37-305aaaabbbbc
    health: HEALTH_ERR
            1 clients failing to respond to capability release
            1 MDSs report slow metadata IOs
            1 MDSs report slow requests
            mons limbo305-1,limbo305-2,limbo305-3 are low on available space
            1 monitors have not enabled msgr2
            2/510 objects unfound (0.392%)
            Reduced data availability: 36 pgs inactive, 33 pgs incomplete
            Possible data damage: 1 pg recovery_unfound
            Degraded data redundancy: 6/1530 objects degraded (0.392%), 1 pg degraded
            32 pgs not deep-scrubbed in time
            32 pgs not scrubbed in time
            195 slow ops, oldest one blocked for 81807 sec, daemons [osd.5,osd.7] have slow ops.
  
  services:
    mon: 3 daemons, quorum limbo305-1,limbo305-2,limbo305-3 (age 23h)
    mgr: limbo305-1(active, since 32m)
    mds: cephfs:1 {0=limbo305-3=up:active} 1 up:standby
    osd: 12 osds: 12 up (since 22h), 12 in (since 22h)
  
  task status:
    scrub status:
        mds.limbo305-3: idle
  
  data:
    pools:   4 pools, 97 pgs
    objects: 510 objects, 182 MiB
    usage:   13 GiB used, 1.1 TiB / 1.1 TiB avail
    pgs:     3.093% pgs unknown
             34.021% pgs not active
             6/1530 objects degraded (0.392%)
             2/510 objects unfound (0.392%)
             60 active+clean
             32 creating+incomplete
             3  unknown
             1  active+recovery_unfound+degraded
             1  incomplete

3 unknown pgs are in cephfs_metadata, then I reinstall yotta on limbo305-1
Quote this message in a reply
08-10-2021, 12:50 AM
Post: #4
RE: BIBO on limbo305
install sage stuck
Code:
[root@limbo305-3 ~]# decent_init=True /thinker/local/forest/util/utilib/installx sage
/thinker/local/shed/installation/limbo305-tc                                        
Installing /thinker/local/shed/installation/limbo305-tc/sage.tar.gz ...

log shows some files not found
Code:
...
10.36.3.51:3124 (slot 19)
Start to run the program with 72 VPCs.
Program execution completed.
protocol error: filename does not match request
protocol error: filename does not match request
protocol error: filename does not match request
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
...
=== Begin Sage Service Test Set ===
./sage_service/sage-service-start.sh
cat: /home/sage/sage/run/service.pid: No such file or directory
cat: /home/sage/sage/run/aide.pid: No such file or directory
Sage is stopped.
          lockstep mark /thinker/globe/.think/lockstep//limbo305-3/sage/tested
          lockstep mark /thinker/globe/.think/lockstep//limbo305-3/sage/decent/workdone

sage start reports need to configure first.
Code:
[sage@limbo305-3 ~]$ ./sage/bin/sage start
ERROR: Sage has not been configured. Please run /home/sage/sage/bin/configure at sage_portal first.
[sage@limbo305-3 ~]$
[sage@limbo305-3 correctness]$ /home/sage/sage/bin/configure
  
====== Generating config.pcf ======
    inferred and exported:
      sage_base: /home/sage/sage
      sage_stdout: /home/sage/sage/stdout/sage.out
      sage_debug: 0
      helper_portal: 10.36.1.49
      sage_ipx: /thinker/etc/ips.cfg
      sage_atp_cnt: 60
  generated /home/sage/sage/config.pcf

====== Generating config.node, sage-svc.inc & hosts.ips ======

-- generating config.node & sage-svc.inc by gen_sage_config

10.36.1.49
10.36.2.50
10.36.3.51
    HELPER_CNT: 6
    HEAVEN_CNT: 1
    ATP_CNT: 60
    backup old config.node: /home/sage/sage/config.node.105847
    backup old sage-svc.inc: /home/sage/sage/sage-svc.inc.105847

Generating walkers from /thinker/etc/ips.cfg
New node: 10.36.1.49
New node: 10.36.2.50
New node: 10.36.3.51
Add fillers
10.36.1.49:1
10.36.1.49:51
10.36.1.49:2
10.36.1.49:51
10.36.1.49:3
10.36.1.49:51
10.36.1.49:4
...
10.36.1.49:51
Adding sage_atp_staff_space_cnt (3), sage_atp_staff_space_start (20) & sage_atp_head_space (14) to config.pcf

Arguments (Part 2):
    WALKER_CNT: 3
    FILLER_CNT: -1
    sage_atp_staff_space_start: 20
    sage_atp_staff_space_cnt: 3
    sage_atp_head_space: 14
  
    config.node is generated in /home/sage/sage/config.node successfully.
    The old config.node is backuped in /home/sage/sage/config.node.105847.
    sage-svc.inc is generated in /home/sage/sage/sage-svc.inc successfully.
    The old svc_inc is backuped in /home/sage/sage/sage-svc.inc.105847.

-- generating hosts.ips from config.node

====== Begin to do the original load.sh ======

Parsing config.node
Configuring for 72 containers
  10.36.3.51 slots: 52
  10.36.1.49 slots: 54 55 56 57 58 59 60 52
  10.36.2.50 slots: 52
  10.36.3.51 slots: 53
  10.36.1.49 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  10.36.2.50 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  10.36.3.51 slots: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  10.36.1.49 slots: 51 50Obtaining config info
Generating config
Sage is going to format the persistent memory of DT after 10 seconds. Please type Ctrl-C if you do not want to do it.

WARNING: Seems the thinker is running. Refused to format the persistent memory.

This may happen because:
  a) Someone else is using the thinker.
  b) Your program fails or you canceled you program by 'Ctrl-C'.

If you are the one who reserve the thinker for the current time range,
or you are sure that the thinker is running your program, you can
kill the thinker by:

$ dt slay

[sage@limbo305-3 correctness]$ dt slay
...
[sage@limbo305-3 correctness]$ /home/sage/sage/bin/configure
,,,
10.36.3.51:3116 (slot 17)
10.36.3.51:3120 (slot 18)
10.36.3.51:3124 (slot 19)
Start to run the program with 72 VPCs.
Program execution completed.
protocol error: filename does not match request
protocol error: filename does not match request
protocol error: filename does not match request
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory
cat: '/home/sage/think/run/results/result-72-sage/stdout-*': No such file or directory

====== configure is done ====== [0810-11:17:45]
[sage@limbo305-3 correctness]$ echo $?
0
[sage@limbo305-3 correctness]$

sage start reports irresponsive
Code:
[sage@limbo305-3 correctness]$ ~/sage/bin/sage start
Sage is irresponsive. Trying to stop it before start it again.
cat: /home/sage/sage/run/aide.pid: No such file or directory
The prior service instance is perhaps 267779
Sage is stopped.

Starting Sage ..............

log shows
Code:
10.36.1.49 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
10.36.2.50 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":
10.36.3.51 "killall wake_sage detect_listen auntie atpd atpa 2> /dev/null":


Sage is stopped.
    sage is to run nohup bash -c 'cd /home/sage/sage/bin; ./start_sage 2>&1 | tee -a /home/sage/sage/stdout/sage.out'
stty: 'standard input': Inappropriate ioctl for device
      store sage_service pid 271612
    start_sage: Tue Aug 10 11:21:10 CST 2021: Sage starts
detect_listen.sh: no process found
[Tue Aug 10 11:21:13 CST 2021] status sage 10.36.1.49 7
        sage state:Sage is stopped or irresponsive. vs. started state:Sage is started.
[Tue Aug 10 11:21:20 CST 2021] status sage 10.36.1.49 7
        sage state:Sage is stopped or irresponsive. vs. started state:Sage is started.
Quote this message in a reply
08-13-2021, 09:30 AM
Post: #5
RE: BIBO on limbo305
(08-08-2021 12:38 PM)zhihao Wrote:  I start bibo on limbo305-1 and start sage on limbo305-3, then I test delete but failed
Code:
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
[root@limbo305-1 test]#

we should use the standardized way -- 'make test'.

i added a cases var so that we can specify which test to run.
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818

Quote:then I update toneroot of procone.py to /thinker/globe/soft/bibo/procuratorate/cases , delete test reports can't open file

should update the module if you do it on limbo305. file copying is okay on wp289 but it is not recommended generally.
Find all posts by this user
Quote this message in a reply
08-13-2021, 09:37 AM (This post was last modified: 08-13-2021 09:41 AM by lingu.)
Post: #6
RE: BIBO on limbo305
insert is not very stable.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 416
insert failed
make: *** [Makefile:22: test] Error 93
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$

zhihao pls investigate this issue.
Find all posts by this user
Quote this message in a reply
08-13-2021, 09:39 AM (This post was last modified: 08-13-2021 10:43 AM by lingu.)
Post: #7
RE: BIBO on limbo305
delete fails
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[sage@limbo305-1 test]$

and sage crashes.
Code:
10.36.3.51:3112 (slot 16)
10.36.3.51:3116 (slot 17)
10.36.3.51:3120 (slot 18)
10.36.3.51:3124 (slot 19)
10.36.3.51:3128 (slot 20)
Start to run the program with 82 VPCs.
Greppy service is running, You can use client tools for searching now

ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x24023204 (10.36.2.50:3064) reports ABORT_ERROR:
Panic: addr out of bound.


ERROR: VPC 0x2402320e (10.36.2.50:3104) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.887261s]
ERROR: VPC 0x2402320e (10.36.2.50:3104) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.887309s] ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x24023208 (10.36.2.50:3080) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.929026s]
ERROR: VPC 0x24023208 (10.36.2.50:3080) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.929104s] ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x2402320d (10.36.2.50:3100) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.935846s]
ERROR: VPC 0x2402320d (10.36.2.50:3100) reports ABORT_ERROR:
Panic: addr out of bound.

[1628815250.935908s] ERROR: VPC reports ABORT_ERROR


ERROR: VPC reports ABORT_ERROR


ERROR: VPC 0x24023201 (10.36.2.50:3052) reports ABORT_ERROR:
Panic: addr out of bound.

i changed sshd_config, rebooted the vm and mounted yfs manually. then sage does not crash but delete still fails.
Code:
[sage@limbo305-1 test]$ make test cases=delete_zeng
Testing delete_zeng
    run test  ./test_delete_zeng.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_zengxingliang.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[sage@limbo305-1 test]$

atpa log shows
Code:
[root@limbo305-2 sage]# pwd
/thinker/local/today/users/sage
[root@limbo305-2 sage]# tail atpa.log-10
2021.08.13 9:0:1 (ffffffa2):     atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0.. 0x0 0xa 0x1
2021.08.13 9:0:1 (ffffffa2):     atpa.runx creating ..- /thinker/fastdata/bibo/e/res2//0/10.. 0x0 0xa 0x1
2021.08.13 9:0:1 (ffffffa2):   atpa::runx() kissing : 10
2021.08.13 9:0:1 (ffffffa2): atpd::KissFromSage: k i len, st_size: 10 6693
2021.08.13 9:0:1 (ffffffa2): atpd::KissFromSage: Get ret : 10 1
2021.08.13 9:0:1 (ffffffa2): KissFromSage: varda5 format, len..- {"typ.. 0x1 0x1a25 0x0
2021.08.13 9:0:1 (ffffffa2):   creating dir /thinker/fastdata/bibo/e/res2//0
2021.08.13 9:0:1 (ffffffa2):   mkdir ..- /thinker/fastdata/bibo/e/res2//0.. 0xffffffffffffffff 0x0 0x0
2021.08.13 9:0:1 (ffffffa2):   creating dir /thinker/fastdata/bibo/e/res2//0/10
2021.08.13 9:0:1 (ffffffa2):     mkdir returns - ffffffffffffffff 0 0 0
[root@limbo305-2 sage]#

efs dir is not installed on limbo305-2
Code:
[root@limbo305-2 sage]# dexer 'ls /thinker/bin/ephemeral/'
10.36.1.49: ls /thinker/bin/ephemeral/
bibo

10.36.2.50: ls /thinker/bin/ephemeral/

10.36.3.51: ls /thinker/bin/ephemeral/
bibo

[root@limbo305-2 sage]#

i re-trun startbibo, and the dir is created.
Find all posts by this user
Quote this message in a reply
08-13-2021, 11:30 AM
Post: #8
RE: BIBO on limbo305
(08-13-2021 09:37 AM)lingu Wrote:  insert is not very stable.
Code:
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 416
insert failed
make: *** [Makefile:22: test] Error 93
[sage@limbo305-1 test]$ make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[sage@limbo305-1 test]$

zhihao pls investigate this issue.

ok, but not get wrong return code yet
Code:
[root@limbo305-1 test]# make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# make test cases=insert_zeng
Testing insert_zeng
    run test  ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]#
[root@limbo305-1 test]# ./test_insert_zeng.sh
sending insert request ./processed/zengxingliang-newkey.json
return code: 200
insert success
[root@limbo305-1 test]#
Quote this message in a reply
08-13-2021, 11:33 AM
Post: #9
RE: BIBO on limbo305
(08-13-2021 11:30 AM)zhihao Wrote:  ok, but not get wrong return code yet

you need run a stress test to reproduce such issues.

dont do it now as we are making it correct first. do it later.

a1. reproduce the problem
a1.1 make a stress test for insert
a1.2 run the stress test to reproduce the problem

a2. fix the problem.
Find all posts by this user
Quote this message in a reply
08-13-2021, 11:42 AM
Post: #10
RE: BIBO on limbo305
make test failed
Code:
[root@limbo305-1 test]# make test cases=delete_testcase
Testing delete_testcase
    run test  ./test_delete_testcase.sh
    sending delete request /thinker/local/soft/bibo/app/test/processed/del_test_case.json to localhost:62818
delete FAILED
make: *** [Makefile:22: test] Error 255
[root@limbo305-1 test]#

find utilib error, because it can't find file "/thinker/etc/soft/sites/limbo305-tc/bibo.pcf"
Code:
[root@limbo305-2 ~]# su sage
[sage@limbo305-2 root]$ /thinker/local/soft/bibo/plug/procone2.py --qid 0 --tycano 19
Traceback (most recent call last):
  File "/thinker/local/soft/bibo/plug/procone2.py", line 23, in <module>
    rb, gmic = worksite.learnSiteMic(modname="bibo")
  File "/thinker/local/forest/util/utilib/worksite.py", line 88, in learnSiteMic
    print >> sys.stderr, "Unable to locate configuration file " + micpfn
NameError: global name 'micpfn' is not defined
[sage@limbo305-2 root]$
Quote this message in a reply
Post Reply 


Forum Jump: