hledger icon indicating copy to clipboard operation
hledger copied to clipboard

prioritise recent inferred market prices over old declared ones

Open aragaer opened this issue 4 years ago • 20 comments

Suppose I have two different non-default commoditites and some time ago I've specified the price of one using the other one. Later the price changed but I have updated it using default commodity:

P 2021-05-06 A 1B
P 2021-05-06 B 10

2021-05-06 * buy
 assets:A  5 A @ 10
 assets

P 2021-05-08 A 20
P 2021-05-08 B 10

2021-05-10 *
 assets:A  0 = 5 A

If I want to find a total value of my 'assets:A' account in terms of default commodity, I'm expecting to see 100 and it is shown correctly. However when I want to see the value in terms of B commodity, hledger finds the old price and shows that I have 5 units of B worth of A, while I'm expecting to see 10.

$ hledger -f /tmp/test.journal bal assets:A -X 'B' --infer-value --debug=2
seeking A to B price using forward prices
valuation date: 2021-05-10
trying: A>
trying: A>B
shortest path from A to B: A>B
price chain:
[ MarketPrice
  { mpdate = 2021 - 05 - 06, mpfrom = "A", mpto = "B", mprate = 1 }
]
                  5B  assets:A
--------------------
                  5B

aragaer avatar May 12 '21 19:05 aragaer

Thanks for the report @aragaer.

When you say "default commodity", I think of D and get confused; I am mentally translating to "no-symbol commodity".

When someone uses the no-symbol commodity, especially in valuation examples, I get confused.

But never mind, I can follow your example. :) However, I am not reproducing it exactly, so I could easily get confused and should avoid saying anything until I can reproduce. Here's what I get (note the three "seeking" attempts):

$ cat a.j
P 2021-05-06 A 1B
P 2021-05-06 B 10

2021-05-06 * buy
 assets:A  5 A @ 10
 assets

P 2021-05-08 A 20
P 2021-05-08 B 10

2021-05-10 *
 assets:A  0 = 5 A

$ hledger -f a.j bal assets:A -X 'B' --infer-value --debug=2
seeking  to B price using forward prices
valuation date: 2021-05-10
shortest path from  to B: none found
seeking  to B price using forward and reverse prices
trying: >A
trying: >B
shortest path from  to B: >B
price chain:
[ MarketPrice
  { mpdate = 2021 - 05 - 08, mpfrom = "", mpto = "B", mprate = 0.1 }
]
seeking A to B price using forward prices
valuation date: 2021-05-10
trying: A>
trying: A>B
shortest path from A to B: A>B
price chain:
[ MarketPrice
  { mpdate = 2021 - 05 - 06, mpfrom = "A", mpto = "B", mprate = 1 }
]
                  5B  assets:A
--------------------
                  5B  
$ hledger --version
hledger 1.21.99

simonmichael avatar May 12 '21 22:05 simonmichael

And with hledger 1.21, I can reproduce. (A change in valuation in master seems.. unexpected and ominous..)

$ hledger-1.21 -f a.j bal assets:A -X 'B' --infer-value --debug=2
seeking A to B price using forward prices
valuation date: 2021-05-10
trying: A>
trying: A>B
shortest path from A to B: A>B
price chain:
[ MarketPrice
  { mpdate = 2021 - 05 - 06, mpfrom = "A", mpto = "B", mprate = 1 }
]
                  5B  assets:A
--------------------
                  5B  

simonmichael avatar May 12 '21 22:05 simonmichael

https://hledger.org/hledger.html#market-prices is the relevant doc. I note that it is quite hard to understand, and it isn't clear enough about how prices are prioritised. (Nor is the code in Hledger.Data.Valuation). Also is it --infer-market-price or --infer-value ?

I would guess it prioritises 1. forward over reverse prices, 2. short over long paths, then 3. recent over old, in that order.

simonmichael avatar May 12 '21 22:05 simonmichael

PS when that's not what you want, the simple workaround is to add more P declarations. I imagine that heavy users of valuation would want daily market prices for all commodities.

simonmichael avatar May 12 '21 22:05 simonmichael

Assuming I have daily "no symbol to A" and "no symbol to B" prices but long long ago I had once recorded an "A to B" price I'll be getting unexpected results until I either add more "A to B" or remove that old one record.

ilya-konovalov avatar May 13 '21 04:05 ilya-konovalov

That's right - because we think that explicit declarations are more trustworthy / should be respected more than inferred price chains - which are after all a kind of convenience hack and don't necessarily correspond to real world market prices.

I can see the inconvenience, though I haven't yet felt it in my own use. In effect it means that once you declare an A to B price (in the current files) then you need to keep declaring them, right ?

But say you have a lot of currencies. Would you want an arbitrarily long, not-visible-to-humans, possibly quite unrealistic (especially with --infer-market-price) price chain found by hledger today to override an accurate direct market price declaration dated yesterday ?

simonmichael avatar May 13 '21 05:05 simonmichael

Let's say I'm investing in several commodities and some of these are priced in US dollars and others are priced in Russian rubles. I want to have an overview of my current investment in a single commodity -- usually dollars. However some of these could be bought for either dollars or rubles. Two years ago I've priced one of these using dollars. Couple days ago that ETF was split by 1 to 10. Suddenly my dollar valuation significantly increased.

Once I found out the reason of such valuation I've reviewed my records and simply removed all the prices where I've written dollar price to a "should be priced in rubles" commodities. The total result was now a bit different from what I had earlier. If not that split I might not even have noticed that I'm still using some really old prices for certain commodities.

ilya-konovalov avatar May 13 '21 05:05 ilya-konovalov

Basically my idea is this -- for any price chain we should calculate "max age" of all the prices in that chain. Then prefer the chain which has the youngest of all the oldest prices. That is if I have daily A -> B and C -> D prices and weekly C -> B prices and one month ago I've written a single A -> C price, when computing A -> D price the longer "A -> B -> C -> D" chain should be preferred over shorter (and forward) "A -> C -> D" chain since it's more up-to-date.

Also maybe a kind of warning like "Using a price from 6 months ago to perform the valuation".

ilya-konovalov avatar May 13 '21 05:05 ilya-konovalov

Let's say I'm investing in several commodities and some of these are priced in US dollars and others are priced in Russian rubles. I want to have an overview of my current investment in a single commodity -- usually dollars. However some of these could be bought for either dollars or rubles. Two years ago I've priced one of these using dollars. Couple days ago that ETF was split by 1 to 10. Suddenly my dollar valuation significantly increased.

Once I found out the reason of such valuation I've reviewed my records and simply removed all the prices where I've written dollar price to a "should be priced in rubles" commodities. The total result was now a bit different from what I had earlier. If not that split I might not even have noticed that I'm still using some really old prices for certain commodities.

Thanks for this discussion, and the real-world example.

I think it illustrates that relying on inferred price chains can be problematic. They are less reality-based than declared prices, less stable, less transparent to a human. As the graph of prices evolves over time, unwanted valuations may happen and that could be hard to notice, as you experienced. (--debug=2 valuation output was a step to improve transparency, maybe there's more we could do ?)

More thoughts (excuse this long reply):

We all want market prices which are as accurate as possible, while also being stable (repeatable, not changing unexpectedly) and transparent (understandable, predictable).

What does "accurate" mean ? I think we can view it as a combination of

  • declaredness - how explicitly was this price declared by the user. "Most declared" would be a direct P declaration from A to B, "least declared" would be a long chain of declared, reversed, and/or inferred-from-transaction prices. When all else is equal, we would prefer to use human-declared prices over hledger-inferred ones.

  • freshness - how recent is the price [chain] compared to the valuation date. When all else is equal, we would prefer to use prices which are on or as close before the valuation date as possible.

When we don't have complete price data, there is a tradeoff between declaredness and freshness.

I have been advocating for user 1 who prioritises declaredness, ie "figure out the price using declared prices where possible, even if they're old, and use inferred prices only as a last resort".

You are speaking for user 2 who prioritises freshness above all else, ie "figure out the price today using the very latest data, I don't care how".

You are convincing me that the freshness-first use case is important and needs more support. I think this approach works well if you assume an efficient market, where the prices are all competitive and so any chain of prices is similar to the direct price.

I still feel the declaredness-first use case also needs to be supported. I think it provides more accurate/realistic/stable prices if you are dealing with an inefficient market (or other kind of network - hledger isn't only for tracking money). Ie prices are not evenly distributed, so price chains are more likely to be different from direct prices. Or perhaps certain conversions aren't possible at all. Or there is some kind of imposed/institutional constraint like "we always use market prices from the first of the month for all calculations".

Current hledger provides declaredness-first, using inferred prices as a fallback. As you point out, this doesn't work well for a person who wants freshness but has only occasional P records; the too-old P records have too much priority and interfere with freshness.

If you give it daily P records, though, it will provide freshness (since a recent P record always wins). This is how I use it, and my current recommendation for anyone doing a lot of valuation: I systematically download prices and add daily P records for all the commodities I care about. Is there any reason you/@aragaer can't do this, by the way ?

Basically my idea is this -- for any price chain we should calculate "max age" of all the prices in that chain. Then prefer the chain which has the youngest of all the oldest prices.

It's a good point: the freshness of a price chain should consider the freshness of all the prices involved.

Also maybe a kind of warning like "Using a price from 6 months ago to perform the valuation".

I thought of various thresholds like this too, but I think they will always be wrong for certain situations, and they cause confusion when near that boundary. "It's working differently here and there/then and now. Why on earth is it doing that ?"

Wow, I need to wrap this up. I hope the above makes sense. Looking forward: I think one price lookup strategy is not sufficient for all use cases, so we need to provide some choice. And I think we need a simpler, higher-level UX than we currently have. Here's one idea:

--infer-market-price=no|min|yes|max
How eagerly should we infer market prices (by reversing, chaining, and/or inferring from transactions) ?
no - use only market prices declared with P; never infer.
min - prefer P prices, even if old; use reversed and/or chained prices only as a last resort.
yes - infer the freshest prices, combining P, reversed and/or chained prices freely.
max - like yes, but also use transacted prices as additional market prices.

no is a more strict setting to suit use case 1 (declared-only). min is current hledger behaviour. It needs to be included for backward compatibility but could possibly be deprecated later. yes is for use case 2 (freshness-first). max is yes plus inferring from transaction prices, like current hledger's --infer-market-price flag. --infer-market-price with no argument would be equivalent to --infer-market-price=max for BC.

simonmichael avatar May 13 '21 20:05 simonmichael

I will think about all that an write a more detailed answer later (tomorrow probably) but I must point out that even having daily P entries wouldn't help me once I had an "incorrect" P entry. That is -- I might have daily A -> B and B -> C prices, but once I've had a single A -> C entry years ago, that is the price that will be now used for A -> C valuations. Now I have a choice of either writing an additional "up-to-date" entry for A -> C, or find and delete that old entry and track which currency I use for which commodity and never ever write an "incorrect" price.

ilya-konovalov avatar May 13 '21 21:05 ilya-konovalov

I'm suggesting you would need daily P records for A->C as well.

Perhaps this is hard to do in some cases ?

simonmichael avatar May 13 '21 21:05 simonmichael

I track about a dozen of commodities, half of which are priced in Russian rubles. Basically this means I have to do some manual multiplication (or actually division) to also provide dollar values to those. I'm creating these records twice a month -- a bunch of records at once so it does seem natural that everything should be possible to calculate only using this set of records.

And I guess I'm lucky I don't have any EUR-priced commodities yet.

ilya-konovalov avatar May 13 '21 21:05 ilya-konovalov

I've been thinking about accuracy in commodity rates and realized that this might be a place where we could be a bit more specific (as opposed to being more generic). Basically I don't usually need to see an exchange rate between any two commodities. I am usually interested in seeing a price of given commodity being converted to a certain currency.

And this is where specifics come -- if we can separate "currencies" from "commodities" and assume that currency exchange rates belong to a more or less "efficient market" we can give higher priority to any "currency to currency" prices. That would actually solve the problem I had -- I don't need hledger to cleverly compute prices of XAU in terms of XAG, but if I have an up-to-date USD to RUB price I want to be able to evaluate any commodity in either USD or RUB and expect a reasonable result.

The idea is to have a single graph of currency rates (and if there are commodities of type "currency" that are not linked to that graph "hledger check" should say so) and when comparing different price chains any paths within that graph are considered of length 0. That is XAU -> RUB -> USD should be considered of same length as just XAU -> USD. And then use the age of the oldest link as a sorting mechanism.

By default all commodities should be the way they are now -- just commodities. That means current behavior where we prefer shorter chains over younger chains. Currencies that should be considered "freely inter-exchangeable" should be explicitly declared as such using commodity directive.

ilya-konovalov avatar May 14 '21 09:05 ilya-konovalov

I can second that. This is really inconvenient in a case where you routinely use an account in one currency to perform operations in another currencies. Say, a bank account in EUR when spending in the U.S. In this case, the purchase is in USD but the charge is in EUR, so it is just natural to infer. Whereas, if you decide to put just one P directive for the EUR/USD pair, for whatever reason, hledger will use it instead of your newer postings, even if the directive is one year old, which is really counterintuitive. Notably, there is no workaround for that except for denoting P directives daily, which as I noted above is clearly inconvenient in this case (because the @@ notation is basically your natural rate here, not a declared one).

pvzhelnov avatar Nov 18 '23 06:11 pvzhelnov

I reread the discussion, and still agree improvement is desirable here. I think I understand @ilya-konovalov's suggestion above, and I'd be happy to test it out if it existed. But it seems to be adding more complexity while still not eliminating surprises and obscure behaviour for users. How could we minimise those ?

simonmichael avatar Nov 19 '23 03:11 simonmichael