[20221018]本地运行与远程运行.txt
--//链接http://blog.tanelpoder.com/2008/02/05/oracle-hidden-costs-revealed-part-1/.
--//里面提到一个问题本地运行与远程运行,oracle性能存在怎么区别,理论讲如果不考虑网络传输,两组差别不大.
--//因为Oracle是一个客户端服务器数据库系统。所有的执行都是在本地执行的,而不管客户机的位置如何,因此性能是相同的。
--//作者给出一个例子,说明一些区别:
1.环境:
SCOTT@book> @ ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2.分别从widnows客户端以及linux服务端测试看看.
--//注:lotslios.sql 来自 tpt 里面的测试脚本.
--//测试在本地服务器.
SCOTT@book> @spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
36 45801 14705 DEDICATED 14706 26 206 alter system kill session '36,45801' immediate;
SCOTT@book> set timing on
SCOTT@book> @lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:08.11
--//注:@lotslios 1e5根本测试不出来.
--//测试在客户端windows:
SCOTT@78> @spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
53 44367 4476:8736 DEDICATED 14714 27 144 alter system kill session '53,44367' immediate;
SCOTT@78> set timing on
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:10.40
--//你可以运行多次,都是windows下测试时间大于在本地服务器的测试时间.
--//作者通过使用他自己写Snapper包以及V$SESSTAT,看不出任何差异.
--//使用strace跟踪(注:作者的服务器solaris,使用truss).
--//测试在本地服务器,使用strace跟踪服务端进程.
$ strace -cp 14706
Process 14706 attached - interrupt to quit
^CProcess 14706 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
nan 0.000000 0 2 read
nan 0.000000 0 2 write
nan 0.000000 0 27 getrusage
nan 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 41 total
--//测试在客户端windows,使用strace跟踪服务端进程.
$ strace -cp 14714
Process 14714 attached - interrupt to quit
^CProcess 14714 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
96.13 0.024820 0 949593 poll
3.87 0.001000 500 2 read
0.00 0.000000 0 2 write
0.00 0.000000 0 83 getrusage
0.00 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.025820 949690 total
--//很慢!!我不得按ctrl+c停止strace,你可以发现大量调用poll.
--//可以发现测试在客户端windows,多了一个poll 系统调用,作者测试平台solaris,调用的是pollsys.
# man -a pool
POLL(2) Linux Programmer's Manual POLL(2)
NAME
poll, ppoll - wait for some event on a file descriptor
SYNOPSIS
#include <poll.h>
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
#define _GNU_SOURCE
#include <poll.h>
int ppoll(struct pollfd *fds, nfds_t nfds,
const struct timespec *timeout, const sigset_t *sigmask);
--//转载:http://blog.tanelpoder.com/2008/02/05/oracle-hidden-costs-revealed-part-1/.
So, there is a big difference in number of pollsys() system calls, depending on which client was used for connecting.
The pollsys syscall is normally used for checking whether there is any data that can be read from a file descriptor (or
whether the file descriptor is ready for receiving more writes). As TCP sockets on Unix are also accessed through file
descriptors, Oracle could be polling the client TCP connection file descriptor… but (without prior knowledge) we can
not be sure.
因此,pollsys()系统调用的数量有很大的差异,这取决于用于连接的客户端。pollsys系统通常用于检查是否有可以从文件描述符读取的
数据(或者文件描述符是否准备好接收更多的写操作)。由于Unix上的TCP套接字也可以通过文件描述符访问,Oracle可以轮询客户端TCP连
接文件描述符…但是(没有事先知识)我们不能确定。
...
Oracle client server communication normally works in RPC fashion – for example a client sends a command to Oracle and
Oracle doesn't return anything until the command is completed.
Oracle客户端服务器通信通常以RPC的方式工作——例如,客户端向Oracle发送一个命令,而Oracle在该命令完成之前不会返回任何东西
。
Now if a user tries to cancel their query (using CTRL+C in sqlplus or calling OCIBreak in non-blocking OCI), a cancel
packet is sent to server over TCP. The packet will be stored in the server side receive buffer of OS TCP stack and
becomes available for reading for the server process (via a TCP socket). However if the server process is in a
long-running loop executing a query, it needs to periodically check the TCP receive socket for any outstanding packets.
And this is exactly what the pollsys() system call does.
现在,如果用户试图取消他们的查询(在sqlplus中使用CTRL+C或在非阻塞OCI中调用OCIBreak),一个取消数据包将通过TCP发送到服务器
。该数据包将存储在OS TCP堆栈的服务器端接收缓冲区中,并可为服务器进程读取(通过TCP套接字)。但是,如果服务器进程处于执行查询
的长时间运行的循环中,那么它需要定期检查TCP接收套接字中是否有任何未完成的数据包。这正是pollsys()系统所做的。
This approach for cancelling an operation is called in-band break, as the break packet is sent in-band with all other
traffic. The server process has to be programmed to periodically check for any newly arrived packets, even if it is
already busy working on something else.
这种取消操作的方法称为in-band break,因为中断包与所有其他业务一起在带内发送。服务器进程必须被编程,以定期检查任何新到达
的数据包,即使它已经在忙于处理其他事情。
There are several functions in Oracle kernel where the developers have put the check for in-band breaks. This means that
in some highly repetitive operations (like nested loop join) the same functions are hit again and again – causing
frequent polling on the TCP socket. And too frequent polling is what causes the peformance degradation.
在Oracle内核中有几个函数,开发人员可以检查in-band breaks。这意味着在一些高度重复的操作(如嵌套循环连接)中,相同的函数会被
反复命中,导致TCP套接字上频繁轮询。而过频繁的轮询是导致性能下降的原因。
However Oracle network layer has a sqlnet.ora parameter called break_poll_skip, which can help in such situations. This
parameters defines, how many times to just silently skip the TCP socket polling when the nsmore2recv() function is
called. The parameter defaults to 3 in recent versions, which means that only 1 of 3 polls are actually executed ( from
above test case it's seen that for 4 million consistent gets roughly 1/3 = 1.3 million pollsys() calls were executed ).
然而,Oracle网络层有一个名为break_poll_skip的sqlnet.ora参数,这可以在这种情况下提供帮助。此参数定义了当调用nsmore2recv()
函数时,只需无声地跳过TCP套接字轮询的次数。在最近的版本中,参数默认为3,这意味着实际3个轮询中只有1个被执行(从上面的测试
用例可以看出,400万一致得到大约1/3=130万个民调系统()调用)。
--//换成执行lotslios 1e5,再使用strace跟踪看看.
--//测试在本地服务器,使用strace跟踪服务端进程.
$ strace -cp 14706
Process 14706 attached - interrupt to quit
^CProcess 14706 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
nan 0.000000 0 2 read
nan 0.000000 0 2 write
nan 0.000000 0 19 getrusage
nan 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 33 total
--//测试在客户端windows,使用strace跟踪服务端进程.
$ strace -cp 14714
Process 14714 attached - interrupt to quit
^CProcess 14714 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000018 0 897 poll
0.00 0.000000 0 2 read
0.00 0.000000 0 2 write
0.00 0.000000 0 19 getrusage
0.00 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000018 930 total
--//poll=897次.
SCOTT@78> set autot traceonly
SCOTT@78> @tpt/lotslios 1e5
generate lots of LIOs by repeatedly full scanning through a small table...
Elapsed: 00:00:00.06
Execution Plan
---------------------------
Plan hash value: 3691747574
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 23P (1)|999:59:59 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | NESTED LOOPS | | 2401P| 15E| 23P (1)|999:59:59 |
| 4 | NESTED LOOPS | | 79T| 650T| 769G (1)|999:59:59 |
| 5 | NESTED LOOPS | | 2631M| 14G| 25M (1)| 84:57:18 |
| 6 | TABLE ACCESS FULL| OBJ$ | 87098 | 255K| 295 (1)| 00:00:04 |
|* 7 | TABLE ACCESS FULL| OBJ$ | 30210 | 90630 | 293 (1)| 00:00:04 |
|* 8 | TABLE ACCESS FULL | OBJ$ | 30210 | 90630 | 293 (1)| 00:00:04 |
|* 9 | TABLE ACCESS FULL | OBJ$ | 30210 | 90630 | 293 (1)| 00:00:04 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=1e5)
7 - filter("A"."OWNER#"="B"."OWNER#")
8 - filter("B"."OWNER#"="C"."OWNER#")
9 - filter("C"."OWNER#"="D"."OWNER#")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2706 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
471 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
--//按照作者介绍break_poll_skip缺省3, consistent gets/3 = 2706/3= 902,与跟踪看到的897接近.
--//顺便提一下,不知道作者如何测试的,@lotslios 10000,consistent gets达到了4089670.或许11.2.0.4执行计划发生了变化.疑问??
--//因为返回是count(*),仅仅1行.即使设置arraysize=2 ,逻辑读我的测试也是2706
--//可以通过改变break_poll_skip的sqlnet.ora参数,减少poll调用.
3.改变break_poll_skip参数在sqlnet.ora文件中.
--//修改break_poll_skip=10,注意测试时要重新登录才生效!!
SCOTT@78> @ spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
53 44369 5380:8500 DEDICATED 15041 27 145 alter system kill session '53,44369' immediate;
SCOTT@78> set timing on
SCOTT@78> @tpt/lotslios 1e5
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000
Elapsed: 00:00:00.04
$ strace -cp 15041
Process 15041 attached - interrupt to quit
^CProcess 15041 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
nan 0.000000 0 2 read
nan 0.000000 0 2 write
nan 0.000000 0 269 poll
nan 0.000000 0 19 getrusage
nan 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 302 total
--//2706/10 = 270.6,poll调用269,已经非常接近.
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:09.99
--//比前面10.40快了一点点.
--//修改break_poll_skip=1000
$ grep break sqlnet.ora
break_poll_skip=1000
SCOTT@78> @ spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
53 44371 4544:8592 DEDICATED 15081 27 146 alter system kill session '53,44371' immediate;
SCOTT@78> set timing on
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:09.65
SCOTT@78> @tpt/lotslios 1e5
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000
Elapsed: 00:00:00.04
$ strace -cp 15081
Process 15081 attached - interrupt to quit
^CProcess 15081 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
nan 0.000000 0 2 read
nan 0.000000 0 2 write
nan 0.000000 0 3 poll
nan 0.000000 0 19 getrusage
nan 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 36 total
--//补充测试@tpt/lotslios 1e8:
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:09.98
$ strace -cp 15081
Process 15081 attached - interrupt to quit
^CProcess 15081 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000066 0 2849 poll
0.00 0.000000 0 2 read
0.00 0.000000 0 2 write
0.00 0.000000 0 29 getrusage
0.00 0.000000 0 10 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000066 2892 total
SCOTT@78> set autot traceonly
SCOTT@78> @tpt/lotslios 1e8
SCOTT@78> set autot traceonly
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
Elapsed: 00:00:09.68
Execution Plan
----------------------------------------------------------
Plan hash value: 3691747574
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 23P (1)|999:59:59 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | NESTED LOOPS | | 2401P| 15E| 23P (1)|999:59:59 |
| 4 | NESTED LOOPS | | 79T| 650T| 769G (1)|999:59:59 |
| 5 | NESTED LOOPS | | 2631M| 14G| 25M (1)| 84:57:18 |
| 6 | TABLE ACCESS FULL| OBJ$ | 87098 | 255K| 295 (1)| 00:00:04 |
|* 7 | TABLE ACCESS FULL| OBJ$ | 30210 | 90630 | 293 (1)| 00:00:04 |
|* 8 | TABLE ACCESS FULL | OBJ$ | 30210 | 90630 | 293 (1)| 00:00:04 |
|* 9 | TABLE ACCESS FULL | OBJ$ | 30210 | 90630 | 293 (1)| 00:00:04 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=1e8)
7 - filter("A"."OWNER#"="B"."OWNER#")
8 - filter("B"."OWNER#"="C"."OWNER#")
9 - filter("C"."OWNER#"="D"."OWNER#")
Statistics
----------------------------------------------------------
12 recursive calls
0 db block gets
2859366 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
471 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1 rows processed
--//2859366/1000= 2859.366
3.收尾:
--//取消break_poll_skip设置.
$ grep break sqlnet.ora
#break_poll_skip=1000
--//补充说明如果break_poll_skip设置10,100,使用strace跟踪很慢.
--//break_poll_skip=100
SCOTT@78> @tpt/lotslios 1e8
generate lots of LIOs by repeatedly full scanning through a small table...
COUNT(*)
----------
100000000
Elapsed: 00:00:16.32
--//2859358 consistent gets
$ strace -cp 15165
Process 15165 attached - interrupt to quit
^CProcess 15165 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000052 0 28487 poll
0.00 0.000000 0 5 read
0.00 0.000000 0 5 write
0.00 0.000000 0 1 lseek
0.00 0.000000 0 46 getrusage
0.00 0.000000 0 26 times
------ ----------- ----------- --------- --------- ----------------
100.00 0.000052 28570 total