Hi all,
I am trying to get the SQL Server 2000 to use the indices implemented by my
custom OLE DB provider to optimize the joins. Basically, here's what I want
to achieve:
1. My provider exposes a table indexed by 1 column ($REF) via IOpenRowset.
Essentially, the rowset it provides is a unique clustered (thus integrated)
index.
2. I pull this column into a table (call it Local) in an SQL Server database
along with some other column(s).
3. I index this table on those other column(s).
4. I want to do a query of the following type:
SELECT Linked.* FROM Local JOIN LinkedSvr...LinkedTbl Linked ON Local.REF = Linked.[$REF]
WHERE <some condition on Local's columns other than REF>
The expected result is that the SQL server will form an execution plan where
Local is scanned (or even directly seeked if condition is fully satisfiable
by an index seek) and then Linked is seeked using the obtained $REF value.
What happens is that the SQL Server does indeed request an index rowset
(setting both table and index IDs in a call to IOpenRowset::OpenRowset() )
but afterwards makes no attempt to seek on it and just scans it. Not sure
what I am doing wrong...
Checklist:
- IDBSchemaRowset: supports TABLES, COLUMNS and INDEXES
- IOpenRowset::OpenRowset() supports table + index
- Index rowset implements IRowset, IRowsetIndex, IAccessor, IColumnsInfo,
IRowsetInfo. Since it is essentially the same object as the table itseff, it
also supports IRowsetLocate and IRowsetBookmark.
- The rowset provides boolmark column(s): I tried using just self-bookmark
and both self-bookmark and external bookmark at once.
- Index As Access Path is set
- compatible collation is set (though $REF is an integer column)
I wonder if there exists a detailed description of the logic that query
optimizer follows when a linked data source is present in the query?
Whatever I was able to find on the Internet is exceedingly schematic and I
seem to be satisfying those requirements...
Regards,
...Max...Max,
What happens if you use the OPENQUERY FUNCTION? Or possibly create an
additional NC index for the column used in the WHERE clause i.e, the local
table column?
HTH
Jerry
"Max Motovilov" <max@.yymap.com> wrote in message
news:ecvd1uN2FHA.612@.TK2MSFTNGP10.phx.gbl...
> Hi all,
> I am trying to get the SQL Server 2000 to use the indices implemented by
> my
> custom OLE DB provider to optimize the joins. Basically, here's what I
> want
> to achieve:
> 1. My provider exposes a table indexed by 1 column ($REF) via IOpenRowset.
> Essentially, the rowset it provides is a unique clustered (thus
> integrated)
> index.
> 2. I pull this column into a table (call it Local) in an SQL Server
> database
> along with some other column(s).
> 3. I index this table on those other column(s).
> 4. I want to do a query of the following type:
> SELECT Linked.* FROM Local JOIN LinkedSvr...LinkedTbl Linked ON Local.REF
> => Linked.[$REF]
> WHERE <some condition on Local's columns other than REF>
> The expected result is that the SQL server will form an execution plan
> where
> Local is scanned (or even directly seeked if condition is fully
> satisfiable
> by an index seek) and then Linked is seeked using the obtained $REF value.
> What happens is that the SQL Server does indeed request an index rowset
> (setting both table and index IDs in a call to IOpenRowset::OpenRowset() )
> but afterwards makes no attempt to seek on it and just scans it. Not sure
> what I am doing wrong...
> Checklist:
> - IDBSchemaRowset: supports TABLES, COLUMNS and INDEXES
> - IOpenRowset::OpenRowset() supports table + index
> - Index rowset implements IRowset, IRowsetIndex, IAccessor, IColumnsInfo,
> IRowsetInfo. Since it is essentially the same object as the table itseff,
> it
> also supports IRowsetLocate and IRowsetBookmark.
> - The rowset provides boolmark column(s): I tried using just self-bookmark
> and both self-bookmark and external bookmark at once.
> - Index As Access Path is set
> - compatible collation is set (though $REF is an integer column)
> I wonder if there exists a detailed description of the logic that query
> optimizer follows when a linked data source is present in the query?
> Whatever I was able to find on the Internet is exceedingly schematic and I
> seem to be satisfying those requirements...
> Regards,
> ...Max...
>|||> What happens if you use the OPENQUERY FUNCTION?
Can't really do an OPENQUERY as my provider is IOpenRowset-based. The
problem I originally had with OPENROWSET() was that it didn't instantiate
the provider in-process and nothing worked. Now that I configured the
provider as in-process I wonder if OPENROWSET() started working... will
try.
> Or possibly create an
> additional NC index for the column used in the WHERE clause i.e, the local
> table column?
Hmm... not sure what you mean here. I can't easily create an index for an
arbitrary field IN MY PROVIDER -- that's why I want to use SQL Server (MSDE,
really) indexing to start off with! My data are in large binary files with
proprietary format and I just expose them as a set of tables. As a matter of
fact, I have 2 indices -- the other one looks like ( ROW, COLUMN, REF ) but
when I exposed both through COLUMNS schema rowset, the SQL Server kept
picking the wrong one! I decided to take this one problem at a time and
hidden the other index. I guess I COULD show it as a pretend nonclustered
index and see what happens... not that it will ultimately get me where I
want to be.
Regards,
...Max...|||Max Motovilov (max@.yymap.com) writes:
> I am trying to get the SQL Server 2000 to use the indices implemented by
> my custom OLE DB provider to optimize the joins. Basically, here's what
> I want to achieve:
Now, that's definitely an advanced topic!
Since I don't know whether this is doable at all, I've posted your question
into to our internal MVP forum. Whether that actually leads to something
I don't know.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Hi Max,
Have you tried implementing the statistics rowsets (special IDBSchemaRowset
rowsets) in your provider as they're implemented in SQL Server provider?
These are spec'd (I think) in one of the later OLE DB spec updates. I'm
remembering this from a while back, and there may even be a whitepaper
(although I believe it just indicates what interfaces are used rather than
when). If you're wanting SQL Server to use the index in your provider in
your database, it may have to know "if its worth it".
Cheers,
Bob Beauchemin
http://www.SQLskills.com/blogs/bobb
"Max Motovilov" <max@.yymap.com> wrote in message
news:eBNr3NO2FHA.2364@.TK2MSFTNGP12.phx.gbl...
>> What happens if you use the OPENQUERY FUNCTION?
> Can't really do an OPENQUERY as my provider is IOpenRowset-based. The
> problem I originally had with OPENROWSET() was that it didn't instantiate
> the provider in-process and nothing worked. Now that I configured the
> provider as in-process I wonder if OPENROWSET() started working... will
> try.
>> Or possibly create an
>> additional NC index for the column used in the WHERE clause i.e, the
>> local
>> table column?
> Hmm... not sure what you mean here. I can't easily create an index for an
> arbitrary field IN MY PROVIDER -- that's why I want to use SQL Server
> (MSDE,
> really) indexing to start off with! My data are in large binary files with
> proprietary format and I just expose them as a set of tables. As a matter
> of
> fact, I have 2 indices -- the other one looks like ( ROW, COLUMN, REF )
> but
> when I exposed both through COLUMNS schema rowset, the SQL Server kept
> picking the wrong one! I decided to take this one problem at a time and
> hidden the other index. I guess I COULD show it as a pretend nonclustered
> index and see what happens... not that it will ultimately get me where I
> want to be.
> Regards,
> ...Max...
>|||Bob,
Thanks for the suggestion! I was thinking along the same lines too but since
implementing TABLE_STATISTICS is a nontrivial amount of work I was waiting
for someone to push me in that direction :) Whitepaper would be something
I'd really love to see... MSDN documentation is sketchy at best :(
Regards,
...Max...
"Bob Beauchemin" <no_bobb_spam@.sqlskills.com> wrote in message
news:O2qunjP2FHA.400@.TK2MSFTNGP09.phx.gbl...
> Hi Max,
> Have you tried implementing the statistics rowsets (special
IDBSchemaRowset
> rowsets) in your provider as they're implemented in SQL Server provider?
> These are spec'd (I think) in one of the later OLE DB spec updates. I'm
> remembering this from a while back, and there may even be a whitepaper
> (although I believe it just indicates what interfaces are used rather than
> when). If you're wanting SQL Server to use the index in your provider in
> your database, it may have to know "if its worth it".
> Cheers,
> Bob Beauchemin
> http://www.SQLskills.com/blogs/bobb
>
> "Max Motovilov" <max@.yymap.com> wrote in message
> news:eBNr3NO2FHA.2364@.TK2MSFTNGP12.phx.gbl...
> >> What happens if you use the OPENQUERY FUNCTION?
> >
> > Can't really do an OPENQUERY as my provider is IOpenRowset-based. The
> > problem I originally had with OPENROWSET() was that it didn't
instantiate
> > the provider in-process and nothing worked. Now that I configured the
> > provider as in-process I wonder if OPENROWSET() started working... will
> > try.
> >
> >> Or possibly create an
> >> additional NC index for the column used in the WHERE clause i.e, the
> >> local
> >> table column?
> >
> > Hmm... not sure what you mean here. I can't easily create an index for
an
> > arbitrary field IN MY PROVIDER -- that's why I want to use SQL Server
> > (MSDE,
> > really) indexing to start off with! My data are in large binary files
with
> > proprietary format and I just expose them as a set of tables. As a
matter
> > of
> > fact, I have 2 indices -- the other one looks like ( ROW, COLUMN, REF )
> > but
> > when I exposed both through COLUMNS schema rowset, the SQL Server kept
> > picking the wrong one! I decided to take this one problem at a time and
> > hidden the other index. I guess I COULD show it as a pretend
nonclustered
> > index and see what happens... not that it will ultimately get me where I
> > want to be.
> >
> > Regards,
> > ...Max...
> >
> >
>|||OK. Found the whitepaper. Is the email address for you below valid (I don't
think I can post news attachments). The whitepaper does say some interesting
things around index access, like "setting the Index as Access Path" provider
option". If this isn't a valid address for you, send me email at the address
below (after stripping the "no spam") with a valid one.
Cheers,
Bob Beauchemin
http://www.SQLskills.com/blogs/bobb
"Max Motovilov" <max@.yymap.com> wrote in message
news:epbP$xP2FHA.3592@.TK2MSFTNGP12.phx.gbl...
> Bob,
> Thanks for the suggestion! I was thinking along the same lines too but
> since
> implementing TABLE_STATISTICS is a nontrivial amount of work I was waiting
> for someone to push me in that direction :) Whitepaper would be something
> I'd really love to see... MSDN documentation is sketchy at best :(
> Regards,
> ...Max...
> "Bob Beauchemin" <no_bobb_spam@.sqlskills.com> wrote in message
> news:O2qunjP2FHA.400@.TK2MSFTNGP09.phx.gbl...
>> Hi Max,
>> Have you tried implementing the statistics rowsets (special
> IDBSchemaRowset
>> rowsets) in your provider as they're implemented in SQL Server provider?
>> These are spec'd (I think) in one of the later OLE DB spec updates. I'm
>> remembering this from a while back, and there may even be a whitepaper
>> (although I believe it just indicates what interfaces are used rather
>> than
>> when). If you're wanting SQL Server to use the index in your provider in
>> your database, it may have to know "if its worth it".
>> Cheers,
>> Bob Beauchemin
>> http://www.SQLskills.com/blogs/bobb
>>
>> "Max Motovilov" <max@.yymap.com> wrote in message
>> news:eBNr3NO2FHA.2364@.TK2MSFTNGP12.phx.gbl...
>> >> What happens if you use the OPENQUERY FUNCTION?
>> >
>> > Can't really do an OPENQUERY as my provider is IOpenRowset-based. The
>> > problem I originally had with OPENROWSET() was that it didn't
> instantiate
>> > the provider in-process and nothing worked. Now that I configured the
>> > provider as in-process I wonder if OPENROWSET() started working...
>> > will
>> > try.
>> >
>> >> Or possibly create an
>> >> additional NC index for the column used in the WHERE clause i.e, the
>> >> local
>> >> table column?
>> >
>> > Hmm... not sure what you mean here. I can't easily create an index for
> an
>> > arbitrary field IN MY PROVIDER -- that's why I want to use SQL Server
>> > (MSDE,
>> > really) indexing to start off with! My data are in large binary files
> with
>> > proprietary format and I just expose them as a set of tables. As a
> matter
>> > of
>> > fact, I have 2 indices -- the other one looks like ( ROW, COLUMN, REF )
>> > but
>> > when I exposed both through COLUMNS schema rowset, the SQL Server kept
>> > picking the wrong one! I decided to take this one problem at a time and
>> > hidden the other index. I guess I COULD show it as a pretend
> nonclustered
>> > index and see what happens... not that it will ultimately get me where
>> > I
>> > want to be.
>> >
>> > Regards,
>> > ...Max...
>> >
>> >
>>
>|||I got this reply from one of the MS devs:
Try select _col_ (not *) from ...
If the index is not covering, we may choose a scan instead due to costing
differences.
He also remarked that they have seen this done, but it takes an amout of
expertise. I hope that the white-paper that Bob found can help you out.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Erland,
> I got this reply from one of the MS devs:
> Try select _col_ (not *) from ...
Tried, doing SELECT <column list> makes no difference. Still a scan.
> He also remarked that they have seen this done, but it takes an amout of
> expertise. I hope that the white-paper that Bob found can help you out.
Bob's suggestion to implement TABLE_STATISTICS makes a lot of sense; I'm
gonna try and fake it (probably will just show some huge cardinality for the
index to make it especially tasty for a seek) and see what happens. I wonder
if temerity may serve as a substitute to expertise :-D
Regards,
...Max...|||Bob, Erland:
> Have you tried implementing the statistics rowsets (special
IDBSchemaRowset
> rowsets) in your provider as they're implemented in SQL Server provider?
Tried this now -- no change. I have not REALLY implemented gathering of
statistics, just return a failrly large cardinality (1,000,000) for both the
table and the [$REF] column. SQL Server does indeed request this schema
rowset (checked with debugger) and the number 1,000,000 makes it into the
estimated execution plan. Remote index scan is estimated at 98% of all
effort so it is not likely that query optimizer underestimates the cost of
the scan, but it still does not want to consider seek. I am thinking that
for some reason it doesn't believe it CAN seek on this index, but how do I
disabuse him of this?
Checklist of what I've done:
- DBPROP_TABLESTATISTICS = DBPROPVAL_TS_CARDINALITY
- Implemented TABLE_STATISTICS. Non-null columns: TABLE_NAME,
STATISTICS_NAME, STATISTICS_TYPE = (DBSTAT_COLUMN_CARDINALITY |
DBSTAT_TUPLE_CARDINALITY),
ORDINAL_POSITION, COLUMN_CARDINALITY, TUPLE_CARDINALITY, TABLE_CARDINALITY
(all cardinalities at 1,000,000)
The whitepaper had a bunch of interesting stuff (some of it I ran into head
on when debugging my code) but it did not really expound upon the process of
query optimization in presence of an index provider. I seem to be satisfying
whatever requirements it imposes. Looks like I am missing SOMETHING
(property, interface, some trait of the returned rowset?), but what? FWIW,
I am including the trace of properties SQL server requests when I estimate
execution plan in Query Analyzer.
Regards,
...Max...
================================MfbSource::SetProperties() returned 0x00040eda
Returned:
{C8B522BC-5CF3-11CE-ADE5-00AA0044773D}
INIT_TIMEOUT 0x00000001 0x00000001 20
INIT_GENERALTIMEOUT 0x00000001 0x00000001 600
INIT_DATASOURCE 0 0 mfoledb.ini
INIT_LOCATION 0 0 E:\MapFrame\ugi\data_big
MfbSource::GetProperties() returned 0x00040eda
Requested:
{C8B522BB-5CF3-11CE-ADE5-00AA0044773D}
DBMSNAME
DBMSVER
PROVIDERNAME
PROVIDEROLEDBVER
CATALOGLOCATION
IDENTIFIERCASE
QUOTEDIDENTIFIERCASE
MULTIPLESTORAGEOBJECTS
OLEOBJECTS
STRUCTUREDSTORAGE
BYREFACCESSORS
SQLSUPPORT
CONCATNULLBEHAVIOR
NULLCOLLATION
TABLESTATISTICS
{2344480C-33A7-11D1-9B1A-006008268B9E}
ASYNCTXNCOMMIT
AUTH_CACHE_AUTHINFO
AUTH_ENCRYPT_PASSWORD
AUTH_INTEGRATED
AUTH_MASK_PASSWORD
AUTH_PASSWORD
BOOKMARKTYPE
{DF10CB94-35F6-11D2-9C54-00C04F7971D3}
ABORTPRESERVE
ACTIVESESSIONS
ASYNCTXNCOMMIT
AUTH_CACHE_AUTHINFO
AUTH_ENCRYPT_PASSWORD
AUTH_INTEGRATED
Returned:
{C8B522BB-5CF3-11CE-ADE5-00AA0044773D}
DBMSNAME 0 0 FieldSmart MFB
DBMSVER 0 0 01.00.0201
PROVIDERNAME 0 0 MFOLEDB.MFB
PROVIDEROLEDBVER 0 0 02.60
CATALOGLOCATION 0 0x00000001
IDENTIFIERCASE 0 0x00000001
QUOTEDIDENTIFIERCASE 0 0x00000001
MULTIPLESTORAGEOBJECTS 0 0x00000001
OLEOBJECTS 0 0x00000001
STRUCTUREDSTORAGE 0 0x00000001
BYREFACCESSORS 0 0 0
SQLSUPPORT 0 0 0
CONCATNULLBEHAVIOR 0 0x00000001
NULLCOLLATION 0 0 4
TABLESTATISTICS 0 0 1
{2344480C-33A7-11D1-9B1A-006008268B9E}
ASYNCTXNCOMMIT 0 0x00000001
AUTH_CACHE_AUTHINFO 0 0x00000001
AUTH_ENCRYPT_PASSWORD 0 0x00000001
AUTH_INTEGRATED 0 0x00000001
AUTH_MASK_PASSWORD 0 0x00000001
AUTH_PASSWORD 0 0x00000001
BOOKMARKTYPE 0 0x00000001
{DF10CB94-35F6-11D2-9C54-00C04F7971D3}
ABORTPRESERVE 0 0x00000001
ACTIVESESSIONS 0 0x00000001
ASYNCTXNCOMMIT 0 0x00000001
AUTH_CACHE_AUTHINFO 0 0x00000001
AUTH_ENCRYPT_PASSWORD 0 0x00000001
AUTH_INTEGRATED 0 0x00000001
MfbRowset::GetProperties() returned 0
Requested:
{C8B522BE-5CF3-11CE-ADE5-00AA0044773D}
BOOKMARKTYPE
Returned:
{C8B522BE-5CF3-11CE-ADE5-00AA0044773D}
BOOKMARKTYPE 0 0 1
MfbSource::SetProperties() returned 0x00040eda
Returned:
{C8B522BC-5CF3-11CE-ADE5-00AA0044773D}
INIT_TIMEOUT 0x00000001 0x00000001 20
INIT_GENERALTIMEOUT 0x00000001 0x00000001 600
INIT_DATASOURCE 0 0 mfoledb.ini
INIT_LOCATION 0 0 E:\MapFrame\ugi\data_big
MfbSource::SetProperties() returned 0x00040eda
Returned:
{C8B522BC-5CF3-11CE-ADE5-00AA0044773D}
INIT_TIMEOUT 0x00000001 0x00000001 20
INIT_GENERALTIMEOUT 0x00000001 0x00000001 600
INIT_DATASOURCE 0 0 mfoledb.ini
INIT_LOCATION 0 0 E:\MapFrame\ugi\data_big
MfbSource::GetProperties() returned 0x00040eda
Requested:
{C8B522BB-5CF3-11CE-ADE5-00AA0044773D}
DBMSNAME
DBMSVER
PROVIDERNAME
PROVIDEROLEDBVER
CATALOGLOCATION
IDENTIFIERCASE
QUOTEDIDENTIFIERCASE
MULTIPLESTORAGEOBJECTS
OLEOBJECTS
STRUCTUREDSTORAGE
BYREFACCESSORS
SQLSUPPORT
CONCATNULLBEHAVIOR
NULLCOLLATION
TABLESTATISTICS
{2344480C-33A7-11D1-9B1A-006008268B9E}
ASYNCTXNCOMMIT
AUTH_CACHE_AUTHINFO
AUTH_ENCRYPT_PASSWORD
AUTH_INTEGRATED
AUTH_MASK_PASSWORD
AUTH_PASSWORD
BOOKMARKTYPE
{DF10CB94-35F6-11D2-9C54-00C04F7971D3}
ABORTPRESERVE
ACTIVESESSIONS
ASYNCTXNCOMMIT
AUTH_CACHE_AUTHINFO
AUTH_ENCRYPT_PASSWORD
AUTH_INTEGRATED
Returned:
{C8B522BB-5CF3-11CE-ADE5-00AA0044773D}
DBMSNAME 0 0 FieldSmart MFB
DBMSVER 0 0 01.00.0201
PROVIDERNAME 0 0 MFOLEDB.MFB
PROVIDEROLEDBVER 0 0 02.60
CATALOGLOCATION 0 0x00000001
IDENTIFIERCASE 0 0x00000001
QUOTEDIDENTIFIERCASE 0 0x00000001
MULTIPLESTORAGEOBJECTS 0 0x00000001
OLEOBJECTS 0 0x00000001
STRUCTUREDSTORAGE 0 0x00000001
BYREFACCESSORS 0 0 0
SQLSUPPORT 0 0 0
CONCATNULLBEHAVIOR 0 0x00000001
NULLCOLLATION 0 0 4
TABLESTATISTICS 0 0 1
{2344480C-33A7-11D1-9B1A-006008268B9E}
ASYNCTXNCOMMIT 0 0x00000001
AUTH_CACHE_AUTHINFO 0 0x00000001
AUTH_ENCRYPT_PASSWORD 0 0x00000001
AUTH_INTEGRATED 0 0x00000001
AUTH_MASK_PASSWORD 0 0x00000001
AUTH_PASSWORD 0 0x00000001
BOOKMARKTYPE 0 0x00000001
{DF10CB94-35F6-11D2-9C54-00C04F7971D3}
ABORTPRESERVE 0 0x00000001
ACTIVESESSIONS 0 0x00000001
ASYNCTXNCOMMIT 0 0x00000001
AUTH_CACHE_AUTHINFO 0 0x00000001
AUTH_ENCRYPT_PASSWORD 0 0x00000001
AUTH_INTEGRATED 0 0x00000001
MfbRowset::GetProperties() returned 0
Requested:
{C8B522BE-5CF3-11CE-ADE5-00AA0044773D}
BOOKMARKTYPE
Returned:
{C8B522BE-5CF3-11CE-ADE5-00AA0044773D}
BOOKMARKTYPE 0 0 1
MfbSource::SetProperties() returned 0x00040eda
Returned:
{C8B522BC-5CF3-11CE-ADE5-00AA0044773D}
INIT_TIMEOUT 0x00000001 0x00000001 20
INIT_GENERALTIMEOUT 0x00000001 0x00000001 600
INIT_DATASOURCE 0 0 mfoledb.ini
INIT_LOCATION 0 0 E:\MapFrame\ugi\data_big
================================|||This is a bummer, Max. I'm almost certain that the Jet OLE DB provider, when
used with modern versions of Access MDBs, implements IRowsetIndex in such a
way that linked servers use the index. Conor? What's the magic?
When OLE DB providers were the major direction/implementation, there was a
cottage industry in figuring out what the "secret sauce" was that made these
things work with [ProductA], [ProductB], etc, just like SQLOLEDB and Jet OLE
DB did. Most often it was a special (or "non-intuitive") interface
implementation or special property/property set. Sometimes it was registry
entries (for client products). Tracing QI calls (which it seems you are
doing) was extremely helpful. As was tracing any call to any
I[XYZ]::GetProperties type interface.
In any case, if it isn't straightforward, it could be a major time sink,
suggest you open a case with MS support. The time savings alone will be
worth it.
Cheers,
Bob Beauchemin
http://www.SQLskills.com/blogs/bobb
"Max Motovilov" <max@.yymap.com> wrote in message
news:eq5hDCW2FHA.3592@.TK2MSFTNGP12.phx.gbl...
> Erland,
>> I got this reply from one of the MS devs:
>> Try select _col_ (not *) from ...
> Tried, doing SELECT <column list> makes no difference. Still a scan.
>> He also remarked that they have seen this done, but it takes an amout of
>> expertise. I hope that the white-paper that Bob found can help you out.
> Bob's suggestion to implement TABLE_STATISTICS makes a lot of sense; I'm
> gonna try and fake it (probably will just show some huge cardinality for
> the
> index to make it especially tasty for a seek) and see what happens. I
> wonder
> if temerity may serve as a substitute to expertise :-D
> Regards,
> ...Max...
>|||Max Motovilov (max@.yymap.com) writes:
> Tried this now -- no change. I have not REALLY implemented gathering of
> statistics, just return a failrly large cardinality (1,000,000) for both
> the table and the [$REF] column. SQL Server does indeed request this
> schema rowset (checked with debugger) and the number 1,000,000 makes it
> into the estimated execution plan. Remote index scan is estimated at 98%
> of all effort so it is not likely that query optimizer underestimates
> the cost of the scan, but it still does not want to consider seek. I am
> thinking that for some reason it doesn't believe it CAN seek on this
> index, but how do I disabuse him of this?
Here's another thing to check - datatype compatibility. What is the type
of the SQL Server column? What is the exposed datatype in your provider.
If there is a mismatch, so that the linked source needs to be converted
to the datatype of the SQL Server column, the index cannot be used.
What happens if you try an index hint? (If you now can use index hints
on remote sources.)
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Max Motovilov (max@.yymap.com) writes:
> Tried this now -- no change. I have not REALLY implemented gathering of
> statistics, just return a failrly large cardinality (1,000,000) for both
> the table and the [$REF] column. SQL Server does indeed request this
> schema rowset (checked with debugger) and the number 1,000,000 makes it
> into the estimated execution plan. Remote index scan is estimated at 98%
> of all effort so it is not likely that query optimizer underestimates
> the cost of the scan, but it still does not want to consider seek. I am
> thinking that for some reason it doesn't believe it CAN seek on this
> index, but how do I disabuse him of this?
Also try
SELECT ... FROM linkedsrv...tbl WHERE indexdcol = value
to see whether that uses the index.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||> Here's another thing to check - datatype compatibility. What is the type
> of the SQL Server column? What is the exposed datatype in your provider.
DBTYPE_UI4 in the provider, tried both INT and NUMERIC(10) on the SQL server
side (the latter involuntarily, as this is what the SQL Server mapped
DBTYPE_UI4 to by default when I did SELECT...INTO from the provider's
table).
> If there is a mismatch, so that the linked source needs to be converted
> to the datatype of the SQL Server column, the index cannot be used.
Well, doesn't look likely -- even when SQL Server has a chance to set up the
"equivalent" type on the local table, it still doesn't like the index later.
As a matter of fact, I just went and changed MY type to DBTYPE_I4 so it
matches up with INT. Same-o... Then I noticed that the SQL column is INT
NULL; ALTER TABLE-d it to NOT NULL. Still same-o...
> What happens if you try an index hint? (If you now can use index hints
> on remote sources.)
Not in SQL Server 2000 I can't... That indeed was the first thing I thought
about, but no dice. Can do the REMOTE hint, but this is for remote query
processors only, if I understand it correctly...
I did notice that SQL Server requests an undocumented propset
(2344480C-33A7-11D1-9B1A-006008268B9E) from my data source object. Can't be
that I've GOT TO HAVE something as obscure as that, can it?
Oh, and your MS contact said "If the index is not covering, we may choose a
scan instead due to costing differences." Well, clustered (or integrated)
index is covering by definition, isn't it?
Regards,
...Max...|||> Also try
> SELECT ... FROM linkedsrv...tbl WHERE indexdcol = value
> to see whether that uses the index.
Eeeeeeeeeeeeeeeeeee-YESSSSSSS!
I should'a thought about that myself, shouldn't I :( Well, now the next
thing is to figure out why is it good for the goose, but not the gander.
VERY BIG THANKS!
...Max...|||Yet, no form of join I can think of works:
- remote JOIN local ON remote.Key=local.Key
- remote, local WHERE remote.Key=local.Key
- remote WHERE Key in (SELECT Key FROM remote)
They all form some sort of a scan-based plan. SELECT * FROM remote WHERE Key
in ( value1, ... , valueN ) works just fine. So maybe it is a cost-related
issue after all, but with 98-99% of estimated cost assigned to the index
scan, you'd think the optimizer would consider a seek instead. After all, it
doesn't expect a huge number of matches in the local table (estimated rows:
1).
...Max...|||Max Motovilov (max@.yymap.com) writes:
>> Also try
>> SELECT ... FROM linkedsrv...tbl WHERE indexdcol = value
>> to see whether that uses the index.
> Eeeeeeeeeeeeeeeeeee-YESSSSSSS!
> I should'a thought about that myself, shouldn't I :( Well, now the next
> thing is to figure out why is it good for the goose, but not the gander.
> VERY BIG THANKS!
I interpret that exclamation as with this query, it used the index?
Going back to your other post:
> DBTYPE_UI4 in the provider, tried both INT and NUMERIC(10) on the SQL
> server side (the latter involuntarily, as this is what the SQL Server
> mapped DBTYPE_UI4 to by default when I did SELECT...INTO from the
> provider's table).
UI4 definitely sounds wrong. There is no type in SQL Server that
matches this (hm, bigint maybe). Although a varchar on the SQL Server
side would probably be converted to unsignet int, to avoid the problem.
Then again, you've already changed to I4, so that point is moot.
> Oh, and your MS contact said "If the index is not covering, we may
> choose a scan instead due to costing differences." Well, clustered (or
> integrated) index is covering by definition, isn't it?
What he had mind here, is the case of a non-clustered index, where a
hit in the index requires a second lookup. For a high hit rate, this
may be too expensive, why scanning the table is better.
So, if for some reason, SQL Server does not see your index as clustered,
then this could happen.
But there is one more possibility. What join operation do you get? Nested
loops, Merge of Hash? With the latter two, you do get a scan, but this
is a one time pass. With Merge SQL Server simply scans the two input
sources in parallell, and picks up matching rows. The two sources must
then be sorted in the same way. In a hash join, SQL Server builds hash
keys, and then uses that for matching. Since your indexes should be in
sync, at least after the change to I4, I would guess on a merge join.
The trade off here is that the loop join is good if there are a few
hits, but bad if there are many. The Merge join has more predictable
performance. Here the statistics matter a lot. On both table sources.
Hm, hints on the remote table may not work, but what about a join hint?
INNER LOOP JOIN?
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||> I interpret that exclamation as with this query, it used the index?
Oh yes indeed! If match is against a constant, it picks seek over scan.
> UI4 definitely sounds wrong. There is no type in SQL Server that
> matches this (hm, bigint maybe).
NUMERIC(10) according to the whitepaper that Bob sent me.
> So, if for some reason, SQL Server does not see your index as clustered,
> then this could happen.
Checked SYSREMOTE_INDEXES -- all columns are in good shape. For a moment, I
thought the problem is in CARDINALITY column in that table: I had it
NULL-ed. Changed it to 1,000,000 -- didn't help. I have no idea if SQL
Server expects an exact value there or an estimated one (same as in
TABLE_STATISTICS). But in any case, my index is marked as UNIQUE, so the
cardinality has got to be the same as the one of the table.
> But there is one more possibility. What join operation do you get? Nested
> loops, Merge of Hash?
Yeah, I thought about that even before writing to Usenet... Explicit join
hints are accepted and produce different plans, but they all involve a full
index scan on the remote table. Besides, SQL Server pretty much knows he's
got a single row coming from one of the tables (I even tried SELECT-ing that
row INTO a temporary table and using the latter in the join -- same thing).
Regards,
...Max...|||Max Motovilov (max@.yymap.com) writes:
> Yeah, I thought about that even before writing to Usenet... Explicit
> join hints are accepted and produce different plans, but they all
> involve a full index scan on the remote table. Besides, SQL Server
> pretty much knows he's got a single row coming from one of the tables (I
> even tried SELECT-ing that row INTO a temporary table and using the
> latter in the join -- same thing).
As I said, Merge and Hash should result in a scan - one single. Nested
loops should preferrably seek the index.
Could you post:
1) CREATE TABLE for the local table (with indexes and keys).
2) Some pseudo declaration for your provider table (use the provider
types.)
3) a) One query which is the plain select.
b) The other that is the join (without hints).
4) The output from SET STATISTICS PROFILE ON for the two queries.
By the way, have you looked at estimated query plan and actual query plan
to see if happen to be different.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Erland,
> Could you post:
Here comes:
> 1) CREATE TABLE for the local table (with indexes and keys).
I used to do SELECT INTO from provider, but here it is using CREATE TABLE:
create table L (
REF int not null,
SERVICE_ID varchar(12) not null
)
create unique clustered index BY_ID on L( SERVICE_ID )
insert L select [$REF], SERVICE_ID from
UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN
> 2) Some pseudo declaration for your provider table (use the provider
> types.)
select COLUMN_NAME, ORDINAL_POSITION, COLUMN_FLAGS, DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH,
CHARACTER_OCTET_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE
from master.dbo.SYSREMOTE_COLUMNS ('UGIMFB') where
TABLE_NAME='GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN'
===================$R 1 16 18 2 2 5 255
$C 2 16 18 2 2 5 255
$REF 3 16 19 4 4 10 255
$CC 4 16 2 2 2 5 255
$CU 5 16 2 2 2 5 255
$X 6 16 3 4 4 10 255
$Y 7 16 3 4 4 10 255
$ANGLE 8 16 2 2 2 5 255
ID 9 16 3 4 4 0 0
SERVICE_ID 10 0 129 12 12 0 0
SERV_SERVICE_COCK_TYPE_CDE$X 11 16 2 2 2 0 0
SERV_FROM_DIR_CDE 12 0 129 2 2 0 0
SERV_BRANCH_FROM_SERVICE_ID 13 0 129 12 12 0 0
SERV_INSULATED_SERVICE_IND 14 0 129 1 1 0 0
SERV_TAP_SIZE_CDE$X 15 16 2 2 2 0 0
SERV_MAIN_ID 16 0 129 10 10 0 0
SERV_TAKEN_FROM_TXT 17 0 129 15 15 0 0
SERV_FROM_HOUSE_NUM 18 0 129 8 8 0 0
SERV_FROM_STREET_TXT 19 0 129 15 15 0 0
SERV_FROM_TYPE_CDE 20 0 129 4 4 0 0
ADDR_HOUSE_TXT 21 0 129 8 8 0 0
ADDR_APARTMENT_TXT 22 0 129 8 8 0 0
ADDR_LOT_TXT 23 0 129 10 10 0 0
ADDR_BOX_TXT 24 0 129 8 8 0 0
ADDR_STREET_DIR_PREFIX_CDE$X 25 16 17 1 1 0 0
ADDR_STREET_NME 26 0 129 24 24 0 0
SERV_BAR_CODE_ID 27 0 129 10 10 0 0
ADDR_STREET_TYPE_CDE$X 28 16 17 1 1 0 0
ADDR_STREET_DIR_SUFFIX_CDE$X 29 16 17 1 1 0 0
ADDR_ROUTE_TXT 30 0 129 5 5 0 0
PRES_PRESSURE_CDE 31 0 129 12 12 0 0
PRES_PRESSURE_CDE$X 32 16 17 1 1 0 0
PLSB_POLITICAL_SUB_NME 33 0 129 19 19 0 0
PLSB_POLITICAL_SUB_NME$X 34 16 2 2 2 0 0
SERV_TAP_FITTING_CDE 35 0 129 15 15 0 0
SERV_COMMENTS_TXT 36 0 129 19 19 0 0
===================
I am only using $REF and SERVICE_ID in test queries, so the abbreviated form
of CREATE TABLE would look like:
create table GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN (
-- skipped
[$REF] numeric(10) not null, -- tried with INT (DBTYPE_I4) as well
-- skipped
SERVICE_ID varchar(12) not null,
-- skipped
)
create unique clustered index BY_REF on
GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN( [$REF] )
there's also another index, which I tried showing to SQL Server and hiding
from it to no effect:
create unique clustered index BY_RC on
GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN( [$R],[$C],[$REF] )
[I know, I know, but in MY provider there CAN be two clustered indices :) I
also tried calling BY_RC integrated but not clustered in INDEXES to no
effect whatsoever]
> 3) a) One query which is the plain select.
select * from UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN where
[$REF]=151399424
Application Profile Statistics
Timer resolution (milliseconds) 0 0
Number of INSERT, UPDATE, DELETE statements 0 0
Rows effected by INSERT, UPDATE, DELETE statements 0 0
Number of SELECT statements 2 2
Rows effected by SELECT statements 3 3
Number of user transactions 5 5
Average fetch time 0 0
Cumulative fetch time 0 0
Number of fetches 0 0
Number of open statement handles 0 0
Max number of opened statement handles 0 0
Cumulative number of statement handles 0 0
Network Statistics
Number of server roundtrips 3 3
Number of TDS packets sent 3 3
Number of TDS packets received 4 4
Number of bytes sent 308 308
Number of bytes received 7936 7936
Time Statistics
Cumulative client processing time 1 1
Cumulative wait time on server replies 0 0
SET STATISTICS PROFILE ON SQL:StmtCompleted 0 0 0 0
select * from UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN where
[$REF]=151399424 SQL:StmtCompleted 125 0 6 0
SET STATISTICS PROFILE OFF SQL:StmtCompleted 0 0 0 0
> b) The other that is the join (without hints).
Application Profile Statistics
Timer resolution (milliseconds) 0 0
Number of INSERT, UPDATE, DELETE statements 0 0
Rows effected by INSERT, UPDATE, DELETE statements 0 0
Number of SELECT statements 2 2
Rows effected by SELECT statements 5 3.66667
Number of user transactions 5 5
Average fetch time 0 0
Cumulative fetch time 0 0
Number of fetches 0 0
Number of open statement handles 0 0
Max number of opened statement handles 0 0
Cumulative number of statement handles 0 0
Network Statistics
Number of server roundtrips 3 3
Number of TDS packets sent 3 3
Number of TDS packets received 9 6.66667
Number of bytes sent 390 335.333
Number of bytes received 15715 10987.7
Time Statistics
Cumulative client processing time 75 26
Cumulative wait time on server replies 3.61856e+008 1.20619e+008
SET STATISTICS PROFILE ON SQL:StmtCompleted 0 0 0 0
select r.* from UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN r join L on
r.[$REF]=l.REF where l.SERVICE_ID='220092228006' SQL:StmtCompleted 62 166
23 0
SET STATISTICS PROFILE OFF SQL:StmtCompleted 0 0 0 0
> 4) The output from SET STATISTICS PROFILE ON for the two queries.
Included above. I copied it from Query Analyzer but it doesn't look like the
ASCII version from OSQL is any more useful...
> By the way, have you looked at estimated query plan and actual query plan
> to see if happen to be different.
The same, other than the actual number of rows, cost etc. Magnitudes are
comparable though. Picks nested loop with index scan -- I've seen it picking
a hash join before. Don't know why.
Regards,
...Max...|||> > b) The other that is the join (without hints).
Oops, missed it the first time:
select r.* from UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN r join L on
r.[$REF]=L.REF
where L.SERVICE_ID='220092228006'|||Max Motovilov (max@.yymap.com) writes:
> Included above. I copied it from Query Analyzer but it doesn't look like
> the ASCII version from OSQL is any more useful...
No, what is above is Client Statistics (I think). The data I'm looking
for has has query plan with esimates, as in the sample below.
You can get that data in Query Analyzer by running text mode.
By the way, if you can put the output in an attachment, that would be good,
as else lines will be wrapped.
Rows Executes StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
-- -- ----- -- -- -- -- -- --- ------ -- -- -- -- -- ------ -- -- -- --
1 1 select top 1 * from pingtest..pingtest 3 1 0 NULL NULL NULL NULL 1.0 NULL NULL NULL 6.4082998E-3 NULL NULL SELECT 0 NULL
1 1 |--Top(1) 3 2 1 Top Top NULL NULL 1.0 0.0 0.0000001 23 6.4082998E-3 [pingtest].[avgms], [pingtest].[maxms], [pingtest].[minms], [pingtest].[lost], [pingtest].[rec], [pingtest].[sent], [pingtest].[time] NULL PLAN_ROW 0 1.0
1 1 |--Clustered Index Scan(OBJECT:([pingtest].[dbo].[pingtest].[PK__pingtest__76CBA758])) 3 4 2 Clustered Index Scan Clustered Index Scan OBJECT:([pingtest].[dbo].[pingtest].[PK__pingtest__76CBA758]) [pingtest].[avgms], [pingtest].[maxms], [pingtest].[minms], [pingtest].[lost], [pingtest].[rec], [pingtest].[sent], [pingtest].[time] 1.0 0.00320405 4.7485001E-4 23 0.0064081 [pingtest].[avgms], [pingtest].[maxms], [pingtest].[minms], [pingtest].[lost], [pingtest].[rec], [pingtest].[sent], [pingtest].[time] NULL PLAN_ROW 0 1.0
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Max Motovilov (max@.yymap.com) writes:
> ASCII output is attached to the message.
Thanks!
I'm curious, what happens if you try:
select r.* from L inner loop join
UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN r on r.[$REF]=l.REF
where l.SERVICE_ID='220092228006'
I failed to mention that when you use a join hint, this also forces the
join order, so it is important to have L first. The problem with the plan
here, is that the tables are accessed in the wrong order.
Another one to try is:
SELECT r.*
FROM UGIMFB...GAS_SERVICE_LOCATION_BAR_CODE_ID_JOIN r
WHERE EXISTS (SELECT *
FROM L
WHERE r.[$REF] = L.REF
AND L.SERVICE_ID='220092228006'
Mainly because this query better expresses the result from a logical point
of view.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Max Motovilov (max@.yymap.com) writes:
> Statistics are attached. Same thing, basically.
I'm afraid I'm starting to run out of ideas, and I have not heard anything
from MS. (The dev people might have taken a day off, in celebration of
that SQL 2005 has shipped!).
I see a convert in the query plan. That convert should not matter, because
it's on the L side. But just in case, change the data type in the provider,
so that the convert disappears from the plan.
Of course, if nothing help you will have to descend to things like:
SELECT @.ref = ref FROM L WHERE ...
SELECT * FROM linked...tbl WHERE [$REF] = @.ref
(Actually try this, to see whether it uses the index if you feed it a
variable.)
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Max Motovilov (max@.yymap.com) writes:
> I have attached the statistics for the query with a join hint. Looks
> similar to what it was before, except for the convert.
I did not expect any improvement, but it's good to have the information,
if someone asks for it.
> Thought about that... in my case, doing the same thing in the client
> code is almost as good (get key from SQL Server then go back to the
> provider). An ugly approach anyway.
But as long it works...
> Otherwise... well, the thought of reverse-engineering Jet provider has
> crossed my mind :-)
So a similar query with Jet works? (I'm illiterate with Jet myself, so
I'm not trying.)
Anyway, this is mainly a acknowledgement note. I don't have any real
new input at this point. Let's see if something comes up during the
week.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||> So a similar query with Jet works? (I'm illiterate with Jet myself, so
> I'm not trying.)
I tried it :( As a matter of fact, it behaves in almost exactly the same way
as the query that uses my provider, except the optimizer picks merge joins
rather than loop joins :( A simple WHERE [$REF]=IndexValue query is
executed remotely on Jet (thus, instantaneously), join of a local and a
remote table always involves the full scan of the latter. I believe it was
Bob Beauchemin who mentioned that remote Jet indices are definitely used in
SQL Server plans... well... either they are subject to the same weird
limitation, or my copy of the MSDE is nuts (despite the fact that it's at
the latest sp level) or I dunno...
Regards,
...Max...|||Probably subject to the same weird limitation. I just know that Jet is held
up as the example of IRowsetIndex/linked server support, rather than how
good/complete that support is. Would the SQL Server 2005 hint about where
the bulk of the query should be executed (for linked servers) help in your
case, is this an orthogonal problem?
Cheers,
Bob Beauchemin
http://www.SQLskills.com/blogs/bobb
"Max Motovilov" <max@.yymap.com> wrote in message
news:%23Qnd5bj3FHA.3600@.TK2MSFTNGP12.phx.gbl...
>> So a similar query with Jet works? (I'm illiterate with Jet myself, so
>> I'm not trying.)
> I tried it :( As a matter of fact, it behaves in almost exactly the same
> way
> as the query that uses my provider, except the optimizer picks merge joins
> rather than loop joins :( A simple WHERE [$REF]=IndexValue query is
> executed remotely on Jet (thus, instantaneously), join of a local and a
> remote table always involves the full scan of the latter. I believe it was
> Bob Beauchemin who mentioned that remote Jet indices are definitely used
> in
> SQL Server plans... well... either they are subject to the same weird
> limitation, or my copy of the MSDE is nuts (despite the fact that it's at
> the latest sp level) or I dunno...
> Regards,
> ...Max...
>|||> Probably subject to the same weird limitation. I just know that Jet is
held
> up as the example of IRowsetIndex/linked server support, rather than how
> good/complete that support is. Would the SQL Server 2005 hint about where
> the bulk of the query should be executed (for linked servers) help in
your
> case, is this an orthogonal problem?
Everything I was doing so far was on SQL Server 2000. Haven't tried 2005 yet
(as I cannot possibly hope to get that accepted by the company), but
probably will just out of curiousity. Which hint are you talking about, btw?
REMOTE is presumably supported by SQL Server 2000, but when I tried that
with both Jet and my provider, it was outright ignored.
...Max...
> Cheers,
> Bob Beauchemin
> http://www.SQLskills.com/blogs/bobb
>
> "Max Motovilov" <max@.yymap.com> wrote in message
> news:%23Qnd5bj3FHA.3600@.TK2MSFTNGP12.phx.gbl...
> >> So a similar query with Jet works? (I'm illiterate with Jet myself, so
> >> I'm not trying.)
> >
> > I tried it :( As a matter of fact, it behaves in almost exactly the same
> > way
> > as the query that uses my provider, except the optimizer picks merge
joins
> > rather than loop joins :( A simple WHERE [$REF]=IndexValue query is
> > executed remotely on Jet (thus, instantaneously), join of a local and a
> > remote table always involves the full scan of the latter. I believe it
was
> > Bob Beauchemin who mentioned that remote Jet indices are definitely used
> > in
> > SQL Server plans... well... either they are subject to the same weird
> > limitation, or my copy of the MSDE is nuts (despite the fact that it's
at
> > the latest sp level) or I dunno...
> >
> > Regards,
> > ...Max...
> >
> >
>|||Sorry, I was referring to EXECUTE ... AT [linked server]... which, after
looking it up, is more like an extension to the dynamic query execution
facility than a query hint for joins as such.
Cheers,
Bob
"Max Motovilov" <max@.yymap.com> wrote in message
news:%234VoKbk3FHA.744@.TK2MSFTNGP10.phx.gbl...
>> Probably subject to the same weird limitation. I just know that Jet is
> held
>> up as the example of IRowsetIndex/linked server support, rather than how
>> good/complete that support is. Would the SQL Server 2005 hint about where
>> the bulk of the query should be executed (for linked servers) help in
> your
>> case, is this an orthogonal problem?
> Everything I was doing so far was on SQL Server 2000. Haven't tried 2005
> yet
> (as I cannot possibly hope to get that accepted by the company), but
> probably will just out of curiousity. Which hint are you talking about,
> btw?
> REMOTE is presumably supported by SQL Server 2000, but when I tried that
> with both Jet and my provider, it was outright ignored.
> ...Max...
>> Cheers,
>> Bob Beauchemin
>> http://www.SQLskills.com/blogs/bobb
>>
>> "Max Motovilov" <max@.yymap.com> wrote in message
>> news:%23Qnd5bj3FHA.3600@.TK2MSFTNGP12.phx.gbl...
>> >> So a similar query with Jet works? (I'm illiterate with Jet myself, so
>> >> I'm not trying.)
>> >
>> > I tried it :( As a matter of fact, it behaves in almost exactly the
>> > same
>> > way
>> > as the query that uses my provider, except the optimizer picks merge
> joins
>> > rather than loop joins :( A simple WHERE [$REF]=IndexValue query is
>> > executed remotely on Jet (thus, instantaneously), join of a local and a
>> > remote table always involves the full scan of the latter. I believe it
> was
>> > Bob Beauchemin who mentioned that remote Jet indices are definitely
>> > used
>> > in
>> > SQL Server plans... well... either they are subject to the same weird
>> > limitation, or my copy of the MSDE is nuts (despite the fact that it's
> at
>> > the latest sp level) or I dunno...
>> >
>> > Regards,
>> > ...Max...
>> >
>> >
>>
>|||Max Motovilov (max@.yymap.com) writes:
> I tried it :( As a matter of fact, it behaves in almost exactly the same
> way as the query that uses my provider, except the optimizer picks merge
> joins rather than loop joins :(
Yeah, but that's quite a difference. OK, not when you are only matching
one row in the local table, but if you would match 10 rows, it would,
as the merge join is one scan, where as the loop join is ten scans. And
if you get enough hits, the merge join is the better option over loop
join + index seek.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||> Yeah, but that's quite a difference...
Agreed. Realized this when going through the specifics to see if it applied.
Bob
"Erland Sommarskog" <esquel@.sommarskog.se> wrote in message
news:Xns9700DEE4B5D02Yazorman@.127.0.0.1...
> Max Motovilov (max@.yymap.com) writes:
>> I tried it :( As a matter of fact, it behaves in almost exactly the same
>> way as the query that uses my provider, except the optimizer picks merge
>> joins rather than loop joins :(
> Yeah, but that's quite a difference. OK, not when you are only matching
> one row in the local table, but if you would match 10 rows, it would,
> as the merge join is one scan, where as the loop join is ten scans. And
> if you get enough hits, the merge join is the better option over loop
> join + index seek.
>
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
>|||> Yeah, but that's quite a difference. OK, not when you are only matching
> one row in the local table, but if you would match 10 rows, it would,
Sure, but since the index on the local table is unique, the optimizer has
got to know that "WHERE key=const" means at most 1 row.
...Max...|||Max,
I am also looking at writing an OLEDB provider and was wondering if you
could point me toward any other documentation on this process that you have
come across outside of the MDAC SDK?
--
Best regards
Mark
"Max Motovilov" <max@.yymap.com> wrote in message
news:ecvd1uN2FHA.612@.TK2MSFTNGP10.phx.gbl...
> Hi all,
> I am trying to get the SQL Server 2000 to use the indices implemented by
> my
> custom OLE DB provider to optimize the joins. Basically, here's what I
> want
> to achieve:
> 1. My provider exposes a table indexed by 1 column ($REF) via IOpenRowset.
> Essentially, the rowset it provides is a unique clustered (thus
> integrated)
> index.
> 2. I pull this column into a table (call it Local) in an SQL Server
> database
> along with some other column(s).
> 3. I index this table on those other column(s).
> 4. I want to do a query of the following type:
> SELECT Linked.* FROM Local JOIN LinkedSvr...LinkedTbl Linked ON Local.REF
> => Linked.[$REF]
> WHERE <some condition on Local's columns other than REF>
> The expected result is that the SQL server will form an execution plan
> where
> Local is scanned (or even directly seeked if condition is fully
> satisfiable
> by an index seek) and then Linked is seeked using the obtained $REF value.
> What happens is that the SQL Server does indeed request an index rowset
> (setting both table and index IDs in a call to IOpenRowset::OpenRowset() )
> but afterwards makes no attempt to seek on it and just scans it. Not sure
> what I am doing wrong...
> Checklist:
> - IDBSchemaRowset: supports TABLES, COLUMNS and INDEXES
> - IOpenRowset::OpenRowset() supports table + index
> - Index rowset implements IRowset, IRowsetIndex, IAccessor, IColumnsInfo,
> IRowsetInfo. Since it is essentially the same object as the table itseff,
> it
> also supports IRowsetLocate and IRowsetBookmark.
> - The rowset provides boolmark column(s): I tried using just self-bookmark
> and both self-bookmark and external bookmark at once.
> - Index As Access Path is set
> - compatible collation is set (though $REF is an integer column)
> I wonder if there exists a detailed description of the logic that query
> optimizer follows when a linked data source is present in the query?
> Whatever I was able to find on the Internet is exceedingly schematic and I
> seem to be satisfying those requirements...
> Regards,
> ...Max...
>|||> I am also looking at writing an OLEDB provider and was wondering if you
> could point me toward any other documentation on this process that you
have
> come across outside of the MDAC SDK?
In my case, the only source of documentation was MSDN and whatever I could
find on the Internet (not much). I have also perused the source code of ATL
OLE DB provider templates at length -- but directly used only some of them
as I found myself increasingly at odds with that design.
I'm loathe to assert too much in the forum where many participants are
significantly more experienced in the subject matter than I am, but IMHO
much depends on the intended use of your provider. In my experience,
providing basic data access capabilities (IOpenRowset based) to ADO clients
was straightforward and reasonably easy -- took me about a week from
scratch. Getting the provider to supply data to Visual Data Tools (in
VStudio 2003) was a nightmare: I had to spend a few days with assembly-level
debugger trying to figure out where various internal errors that VStudio
displays are coming from and what pieces of functionality I may be missing.
The results are briefly enumerated here
(http://swsw.motovilov.net/weblogs/swsw/archives/2005/10/getting_your_da.htm
l). And of course I still can't get the SQL Server to use my indices and it
looks like no amount of documentation short of the one contained in the
minds of original developers would help here :)
...Max...|||Max Motovilov (max@.yymap.com) writes:
>> Yeah, but that's quite a difference. OK, not when you are only matching
>> one row in the local table, but if you would match 10 rows, it would,
> Sure, but since the index on the local table is unique, the optimizer has
> got to know that "WHERE key=const" means at most 1 row.
In this case, yes. I was more thinking in general terms.
I still have no news, sorry.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||OK, so I did get answer, and it's essentially negative. The guy had contact
with suspected that the seek was costed very high, because often they are.
Notice SQL Server have no information about the cost of various operations
against a certain OLE DB provider.
A scan can be a single-pass operation, if done rightly. Each call to he
OLE DB provider has a cost in itself, which is as I said, unknown to SQL
Server. It matters here whether the provider is in process or out-of-
process. The suspicion is that the cost for a call is very high, which
thus gives favours to singe scans over many seeks.
Yes, I know that in this case you are only accessing one row, but the
optimizer may not have the shortcut for this situation. As you found out,
you did not get much better result with Jet.
Since you can get the seek with a single, I think you will have to be
content with that workaround, as ugly as you may find it.
If you really want to beat this horse to death, opening a case with MS
is about the only way. It sounds to me as that could be quite an expensive
affair, though. (Although if it turns out to be a bug, then you should
be refunded.)
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||Erland,
> Server. It matters here whether the provider is in process or out-of-
> process. The suspicion is that the cost for a call is very high, which
> thus gives favours to singe scans over many seeks.
FWIW, it is in-process.
> Since you can get the seek with a single, I think you will have to be
> content with that workaround, as ugly as you may find it.
Well yes; either that or do it in two steps in the client (more than likely,
ADO-based) code. Frankly, I wish there was a way to just submit a handcoded
execution plan to the server -- kind of like an assembly language for the
query engine. As a matter of fact, this is something I wished for long
before starting this project with an OLE DB provider... but I doubt I'll
ever get something like that, anywhere.
> If you really want to beat this horse to death, opening a case with MS
> is about the only way. It sounds to me as that could be quite an expensive
> affair, though.
Yeah, considering that the documentation most likely doesn't say that what I
am trying to achieve is indeed possible. I think I (and my management) can
be content with what works as of right now. That doesn't mean _I_ _will_be_
content -- the "right" solution looks so teasingly close -- one just has to
somehow educate the optimizer about the true costs of each operation. If I
saw it working with any other provider I might have been motivated enough to
continue; as it is I'll probably set it aside.
Thanks again for your, and everybody else's, help!
Regards,
...Max...|||Max Motovilov (max@.yymap.com) writes:
> Well yes; either that or do it in two steps in the client (more than
> likely, ADO-based) code. Frankly, I wish there was a way to just submit
> a handcoded execution plan to the server -- kind of like an assembly
> language for the query engine. As a matter of fact, this is something I
> wished for long before starting this project with an OLE DB provider...
> but I doubt I'll ever get something like that, anywhere.
You will. There is a feature in SQL 2005 called plan guides which
essentially permits you define the query plan for a query (within some
reasonable limits; it must be a valid plan for the query). Whether it works
for a remote query, I don't know.
I believe other RDBMS products has this as well, for instance DB2.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp|||> Haven't tried 2005 yet
> (as I cannot possibly hope to get that accepted by the company), but
> probably will just out of curiousity.
Tried that now. Same results as SQL Server 2000.
...Max...|||Bob is it possible you could send me this whitepaper since I am looking at
implementing a provider and would like to try and avoid the same problems
that Max has encountered?
--
Best regards
Mark
email: swozz_@.hotmail.com.nospam
"Bob Beauchemin" <no_bobb_spam@.sqlskills.com> wrote in message
news:%23EVYdZQ2FHA.2312@.tk2msftngp13.phx.gbl...
> OK. Found the whitepaper. Is the email address for you below valid (I
> don't think I can post news attachments). The whitepaper does say some
> interesting things around index access, like "setting the Index as Access
> Path" provider option". If this isn't a valid address for you, send me
> email at the address below (after stripping the "no spam") with a valid
> one.
> Cheers,
> Bob Beauchemin
> http://www.SQLskills.com/blogs/bobb
>
> "Max Motovilov" <max@.yymap.com> wrote in message
> news:epbP$xP2FHA.3592@.TK2MSFTNGP12.phx.gbl...
>> Bob,
>> Thanks for the suggestion! I was thinking along the same lines too but
>> since
>> implementing TABLE_STATISTICS is a nontrivial amount of work I was
>> waiting
>> for someone to push me in that direction :) Whitepaper would be
>> something
>> I'd really love to see... MSDN documentation is sketchy at best :(
>> Regards,
>> ...Max...
>> "Bob Beauchemin" <no_bobb_spam@.sqlskills.com> wrote in message
>> news:O2qunjP2FHA.400@.TK2MSFTNGP09.phx.gbl...
>> Hi Max,
>> Have you tried implementing the statistics rowsets (special
>> IDBSchemaRowset
>> rowsets) in your provider as they're implemented in SQL Server provider?
>> These are spec'd (I think) in one of the later OLE DB spec updates. I'm
>> remembering this from a while back, and there may even be a whitepaper
>> (although I believe it just indicates what interfaces are used rather
>> than
>> when). If you're wanting SQL Server to use the index in your provider in
>> your database, it may have to know "if its worth it".
>> Cheers,
>> Bob Beauchemin
>> http://www.SQLskills.com/blogs/bobb
>>
>> "Max Motovilov" <max@.yymap.com> wrote in message
>> news:eBNr3NO2FHA.2364@.TK2MSFTNGP12.phx.gbl...
>> >> What happens if you use the OPENQUERY FUNCTION?
>> >
>> > Can't really do an OPENQUERY as my provider is IOpenRowset-based. The
>> > problem I originally had with OPENROWSET() was that it didn't
>> instantiate
>> > the provider in-process and nothing worked. Now that I configured the
>> > provider as in-process I wonder if OPENROWSET() started working...
>> > will
>> > try.
>> >
>> >> Or possibly create an
>> >> additional NC index for the column used in the WHERE clause i.e, the
>> >> local
>> >> table column?
>> >
>> > Hmm... not sure what you mean here. I can't easily create an index for
>> an
>> > arbitrary field IN MY PROVIDER -- that's why I want to use SQL Server
>> > (MSDE,
>> > really) indexing to start off with! My data are in large binary files
>> with
>> > proprietary format and I just expose them as a set of tables. As a
>> matter
>> > of
>> > fact, I have 2 indices -- the other one looks like ( ROW, COLUMN,
>> > REF )
>> > but
>> > when I exposed both through COLUMNS schema rowset, the SQL Server kept
>> > picking the wrong one! I decided to take this one problem at a time
>> > and
>> > hidden the other index. I guess I COULD show it as a pretend
>> nonclustered
>> > index and see what happens... not that it will ultimately get me where
>> > I
>> > want to be.
>> >
>> > Regards,
>> > ...Max...
>> >
>> >
>>
>>
>