dex-services Only use pricegraph as price estimation source for solver

The solver needs price estimates. Historically we have been using external sources to provide these like kraken, dexag, 1inch.

In #842 we implemented the pricegraph as a source of price estimates. I would like to get rid of the other price sources now. Is there any advantage in having them? The pricegraph should give us prices for every token connected to the fee and for other tokens we don't need a price. It can't fail and it easy to get historical price estimates because the only input data is the orderbook. This has an impact on #970 .

Jun 29 '20 11:06 e00E

The reason for using external price sources is to avoid well known liquid tokens to derive too far from the market price on other exchanges. E.g. even if there is an order selling ETH at $1 we would want to take a price that is as close as possible to a reasonable market price on other exchanges.

What is your concern with using the exact same PriceOracle component as currently used in the driver in the price-estimation service?

Jun 29 '20 20:06 fleupold

It is this still the conceptual thing that prices should be determined by our exchange only. For the price-estimator it also makes historical price estimates easier to understand.

The reason for using external price sources is to avoid well known liquid tokens to derive too far from the market price on other exchanges.

But why? In the end the price is determined by the orders we have. If these orders show a different price then that is the true price. The goal of the price estimates is to give these prices to the solver.

@marcovc wrote

But in general, we need the prices for all tokens to compute the rounding buffers. I guess whatever approximation will do, we just need some.

This does not sound like there is a requirement that the price estimates match other exchanges. Really, the estimates from pricegraph seem more accurate to me even if they are different from other exchanges. If I understand it correctly, even if we estimated a price from another exchange the solver uses this only as a starting point anyway and it will end up at a different price which is presumably closer to what pricegraph would also predict.

Jun 30 '20 04:06 e00E

Two important things to note here.

If we could derive the final prices in the price estimator with 100% accuracy, then the job of the solver would be much, much, more easy. I imagine there are a number of approximations made in the price estimator, most notably the fact that it doesn't care about what the objective function is, that can make these estimations goes sideways.
There is a chicken-and-egg problem in the solver. It needs good price estimations to be able to compute 'meaningful' prices. When there is a lot of orders our price estimator usually does a good job, but when there isn't, or when there are crazy orders in the system, the external prices can help to drive the price estimator to a 'sane' estimate.

Jun 30 '20 06:06 marcovc

In the end the price is determined by the orders we have. If these orders show a different price then that is the true price

While I personally share this libertarian view we made the deliberate decision to try and prevent gross price discrepancies from other markets until we are a prominent enough player in the market and can claim we make the price (especially for highly liquid tokens such as DAI, ETH, etc).

The original solver would not deviate more than some percentage (I believe 20%) from these estimates. However I also believe the best ring solver and open solver might no longer respect these estimates in which case the original "protective argument" would no longer be valid.

Also looping in @josojo and @twalth3r who might have thoughts about deprecating the external price estimates from other exchanges altogether and only rely on intrinsic order-book method. Marco's point that the price estimation might fail for thin order-books with weird orders is something we did experienced in the beginning of the exchange's life.

Jun 30 '20 11:06 fleupold

The original solver would not deviate more than some percentage (I believe 20%) from these estimates.

That's a good point I wasn't aware of.

Jun 30 '20 13:06 e00E

The original solver would not deviate more than some percentage (I believe 20%) from these estimates.

We're not doing this anymore at the moment. When we were still optimizing for volume, this was a helpful restriction to keep prices meaningful for the overall solution (because volume tends to "draw prices away from the center"). With disregarded utility as objective, we no longer have this issue, because it "draws prices to the center". For modelling reasons, we still need some bounds on the prices which are currently set to [0.01 * estimated_price, 100 * estimated_price].

Jun 30 '20 13:06 twalth3r

The reason for using external price sources is to avoid well known liquid tokens to derive too far from the market price on other exchanges. E.g. even if there is an order selling ETH at $1 we would want to take a price that is as close as possible to a reasonable market price on other exchanges.

We made the deliberate decision to try and prevent gross price discrepancies from other markets until we are a prominent enough player in the market and can claim we make the price (especially for highly liquid tokens such as DAI, ETH, etc).

Taking Tom's response into account I don't think this works. Even if we didn't have the pricegraph, we would not be preventing gross price discrepancies in the solutions because the solver can still pick prices within a factor of 100 of the estimations. Now that we do have the pricegraph and take the average a few sources, the estimated price can still be heavily influenced by one source being significantly wrong. Even if the restriction worked then solvers not run by us could still create solutions with any valid price while we would be not doing including possible trades because the price doesn't match external sources.

In the above quote there is no why given. I'm assuming the reason is to protect users from unexpectedly bad trades (or good for the other side of the trade). With these arguments we can see that this is not working.

Finally, I ran a test to compare the current pricegraph estimates to the external sources (click to expand):

prices for token id 8 alias SETH
dexag     :                223348369797602082816
pricegraph:                232783725046747660288

prices for token id 1 alias WETH
kraken    :                224813539999999983616
dexag     :                226734380590400274432
pricegraph:                232082018956821463040

prices for token id 15 alias SNX
dexag     :                  2020531733527791872
pricegraph:                 46462866658022309888

prices for token id 13 alias CDAI
dexag     :          204852203652734136103206912
pricegraph:          214344914473043663396012032

prices for token id 18 alias GNO
kraken    :                 22862602000000000000
dexag     :                 22833523239725715456
pricegraph:                 26227909380263489536

prices for token id 16 alias CHAI
dexag     :                  1018009467371560064
pricegraph:                  1054807989230949632

prices for token id 2 alias USDT
kraken    :       999524540000000015992352669696
dexag     :       983865701570588146769204346880
pricegraph:      1052002953904856066398240112640

prices for token id 11 alias WBTC
dexag     :     89801134105001279163526696402944
pricegraph:     97568641965694133663449128894464

prices for token id 5 alias PAX
dexag     :                   977304597542508544
pricegraph:                  1053108662715266816

prices for token id 3 alias TUSD
dexag     :                  1019994249347627264
pricegraph:                  1053108662715266816

prices for token id 7 alias DAI
kraken    :                  1007906710000000128
dexag     :                  1000000000000000000
pricegraph:                  1053108662715266816

prices for token id 4 alias USDC
kraken    :      1000046009999999988627459276800
dexag     :       986793847201200239315494371328
pricegraph:      1053161326047375433821176463360

prices for token id 9 alias SUSD
dexag     :                   999337321923043456
pricegraph:                  1053108662715266816

Jul 01 '20 09:07 e00E

Agree, that the user protection argument is no longer valid with such big margins.

@marcovc do I understand your points correctly that the concern is an attacker could either manipulate the price estimate with crazy orders (this likely comes at a cost since orders would have to be matchable) or trick the price estimator into giving an estimate which is not feasible and way off the "right" price due to the approximations we do?

Finally, I ran a test to compare the current pricegraph estimates to the external sources (click to expand):

Other than for SNX prices seem in line (it's only a single data point in a likely benign setting). Was this more for completeness or is there something to take away from the data?

Jul 01 '20 13:07 fleupold

For completeness and to show that the prices are close together.

Jul 02 '20 05:07 e00E

@marcovc do I understand your points correctly that the concern is an attacker could either manipulate the price estimate with crazy orders (this likely comes at a cost since orders would have to be matchable) or trick the price estimator into giving an estimate which is not feasible and way off the "right" price due to the approximations we do?

My concerns are purely technical - I haven't considered possible exploits.

In the standard solver, these price estimates are important for two tasks: scaling, and doing the linear approximation of the objective function. While I guess we don't need very accurate estimations for the former, the latter has shown to be very sensible to them (specially in those pathological instances). The linear approximation of the objective function, is the main culprit for the infamous "negative utility" problems, so it is an important thing.

These price estimates should not be thought as "initial points" or something like that, but rather that they determine the slope of the line that replace the non-linear curve that measure how good a solution is. The current implementation, which uses this regression, is something that we iterated for quite some time, and even now we believe there is room for improvement.

Personally I think it would be great to get a better algorithm for it, and if that algorithm runs outside of the solver, even better :) But I think that to evaluate such an algorithm we would need to do a more in-depth comparison, that includes covering these weird instances.

cc/ @twalth3r

Jul 02 '20 07:07 marcovc

So, recently we started collecting some historic data on price estimates and solutions. Specifically, with this data I generated a graph with prices for the WETH-DAI market (so, the WETH-DAI exchange rate) and estimated best-bid and best-ask spread. The idea is to see how often the exchange rates fall between pricegraph estimated exchange rates.

Here is the historic graph:

Some other fun facts, the solver exchange rate is on average (excluding some outlier points at the beginning of the exchange, where there were very few orders in the exchange) 0.7% from the middle of the bid/ask spread with a standard deviation of 1.74. If we take more recent results (starting from batch 5290000), then this drops down even further to 0.42% with a σ of 0.66. So all in all, pricegraph estimates seem pretty accurate.

edit: We could get a better estimate of the "center" of the bid-ask spread by instead taking an average weighted on liquidity, in order to push the "center" to the side with more liquidity - still need to confirm if this is correct. edit: The pricegraph estimates definitely struggle with thin orderbooks as previously brought up.

I imagine that for approximating the objective function, the closer the exchange rate is to the real one the better (so the slope is more accurate), which makes pricegraph nicely suited for this. More so than external prices in some cases, specifically price sources like Kraken and Dexag that do not list OWL prices and assume that 1USD/DAI <=> 1OWL, which is not entirely correct (although recently because of large OWL orders at ~95c, the price has become closer).

Also, a HUGE fallacy in my argument is that just because in the past, the pricegraph estimates were correct, does not mean they will continue to be. Also, there seems to be some issue when removing overlapping ring trades from the graph (to compute the reduced bid-ask spread) that cause the estimates to be off which I am currently looking in to.

Also, this analysis is done for the WETH-DAI market, I will look into *-OWL markets to get a better idea of how pricegraph estimates work on exchange prices (instead of exchange rates).

edit: I also share some concerns about being able to manipulate the solver with orders. However, in practice this seems quite difficult to me as it would require an order with balance to be part of the batch, meaning that there would be a cost to manipulating the prices. That being said, there might be some way to manipulate the solver in some unforeseen way we haven't though about.

Jul 02 '20 07:07 nlordell

@nlordell The values above look impressively close! I think the main issue with this price estimation is more about robustness than precision though.

We have a set of exotic instances that we keep accumulating. Perhaps I could send them to you (or @e00E?), you would run the price estimation and insert the estimated prices as "external prices" in the jsons, and send them back to me. Then I would run the solver here using all these estimates, thus bypassing our own price estimator, and compare the results. This would give us some confidence on how this solution would work robustness-wise.

What do you think?

Jul 02 '20 14:07 marcovc

What do you think?

Sounds great! For accuracy, can these instances include batch numbers so I can get price estimates at those batches?

I think the main issue with this price estimation is more about robustness

I also agree, specifically it struggles with illiquid tokens, in the data I got, especially near the beginning of the exchange (when it was deployed) it looks like there were cases where the only orders for that token pair were overlapping orders - that means that there were no price estimates once the negative cycles were reduced.

As for other unreliability - the other thing I found was issues with how it reduced the orderbook which may cause orders that should be completely reduced to not be, effectively skewing the price. I'm currently trying to figure out exactly when/how this happens and try to come up with an idea for fixing it. I will tag you when I create an issue.

Besides those robustness concerns, do you have any others?

Jul 02 '20 15:07 nlordell

I think we can start with these ones attached. negative_utility.zip

Besides those robustness concerns, do you have any others?

No, I guess that's really my main concern - to have something able to handle even most esoteric cases.

Jul 03 '20 09:07 marcovc

This is blocked on getting more reliable estimates from the internal estimation (#1148)

Jul 28 '20 12:07 fleupold

dex-services dex-services copied to clipboard

Only use pricegraph as price estimation source for solver

dex-services
dex-services copied to clipboard