tls
tls copied to clipboard
[Question] How to efficiently find long period signals in data with long gaps?
Can you suggest a strategy how to efficiently find transits for light curves with long gap e.g. 1 year as in the case of EPICs observed in K2 Campaign 16 and again in C18? I may perhaps limit the period search space using period_min
and period_max
but I wonder if you have a better solution.
Do you have a good example to test? Code + data
Hi. I was coming here to ask something similar. The point is that grid.py
uses time_span
to determine the sampling of the period_grid
. When long time gaps are present (e.g. TICs with sector 1 and 27), it'd seem that the period_grid becomes heavily dense (like 400,00 data points) and the computation effort increases significantly.
Do you think that it would make sense to calculate the time_span in a different way? I know that we can modify the period_grid
density by reducing the oversampling, but the minimum value for this input is 1, which could also be insufficient to prevent the computational cost increment.
Can you suggest a strategy how to efficiently find transits for light curves with long gap e.g. 1 year as in the case of EPICs observed in K2 Campaign 16 and again in C18? I may perhaps limit the period search space using
period_min
andperiod_max
but I wonder if you have a better solution.
Just as a suggestion and maybe @hippke has a better idea or can refute what I'll suggest. Maybe you could set the oversampling_factor
parameter to 1 (which defaults to 3 in tls_constants.py). That would reduce the density of your period grid and make your computations lighter.
Ah, I see. At the time I tested TLS, I used a lot of Kepler and early TESS data with only minor gaps. So that's a use case I had not considered. Unfortunately, there is an inevitable computational burden when you go to longer time baseline (for a corresponding gain in sensitivity to longer period planets). So there's no easy solution, i.e. no free lunch. Making TLS run faster will make it less sensitve. However, that may still be OK or even desired, depending on the use case. Can you provide a full working example that I can check out myself?
I think that it would be feasible without much effort to add the option to the user to provide the period grid. What do you think @hippke ? If not provided then TLS would use the Renault strategy used nowadays.
Sure, but what's it good for? The user can already request a grid with any [from, to] in any log-spacing they want?
Do you mean by using the oversampling factor? I thought that the lower limit for that value was 1 because numpy gave me some error where it said that only integers were allowed. I will try to reproduce that situacion, but if true, the grid density would be restricted
Yes, you should be able to use any floating point value for oversampling_factor
, e.g. 0.5
Do you mean by using the oversampling factor? I thought that the lower limit for that value was 1 because numpy gave me some error where it said that only integers were allowed. I will try to reproduce that situacion, but if true, the grid density would be restricted
I'm unable to reproduce it. Therefore I agree with you that p_min, p_max and oversampling (given that it can be lower than 1) are enough for us the users to tune the period grid. I hope that it suffices the OP needs.
Hi. I reproduced it. See attached error traceback for oversampling = 3.75 as example:
arrays used as indices must be of integer (or boolean) type
Traceback (most recent call last):
results = model.power(**power_args)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py", line 263, in power
SR, power_raw, power, SDE_raw, SDE = spectra(chi2, self.oversampling_factor)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/stats.py", line 119, in spectra
my_median = running_median(power_raw, kernel)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/helpers.py", line 96, in running_median
med = numpy.median(data[idx], axis=1)
IndexError: arrays used as indices must be of integer (or boolean) type
arrays used as indices must be of integer (or boolean) type
Process finished with exit code 0
With this in mind, we cannot go down oversampling_factor = 1 and we'd be restricted.
Do you think that the error found above is a bug or is it expected?
Looks like a bug. Can you share the code to reproduce? I will look into it and make a bugfix if possible.
Sure. Here it is for transitleastsquares
version 1.0.25 and using lightkurve
v2:
import lightkurve
import transitleastsquares as tls
import multiprocessing
lcf = lightkurve.search_lightcurve("TIC 305048087", mission="TESS", cadence="short", sector=[2], author="SPOC")\
.download_all()
lc = lcf.stitch().remove_nans()
model = tls.transitleastsquares(lc.time.value, lc.flux.value)
results = model.power(period_min=0.5, period_max=20, use_threads=multiprocessing.cpu_count(), oversampling_factor=3.5)
And the entire output:
Transit Least Squares TLS 1.0.25 (04 June 2020)
Creating model cache for 38 durations
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/transit.py:157: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
lc_arr = numpy.array(lc_arr)
0%| | 0/2919 periods | 00:00<?Searching 18317 data points, 2919 periods from 0.601 to 13.702 days
Using all 8 CPU threads
100%|██████████| 2919/2919 periods | 01:14<00:00
Traceback (most recent call last):
File "/home/martin/git_repositories/sherlockpipe/test.py", line 9, in <module>
results = model.power(period_min=0.5, period_max=20, use_threads=multiprocessing.cpu_count(), oversampling_factor=3.5)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py", line 263, in power
SR, power_raw, power, SDE_raw, SDE = spectra(chi2, self.oversampling_factor)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/stats.py", line 119, in spectra
my_median = running_median(power_raw, kernel)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/helpers.py", line 96, in running_median
med = numpy.median(data[idx], axis=1)
IndexError: arrays used as indices must be of integer (or boolean) type
Kind regards.
Fixed in version 1.0.26, which is also available via pip.
oversampling_factor
now accepts any floating point value.
Great! I would sayas that we can explore long data gaps with the same computacional cost than one after another sectors. To me this should be clear enough and answers the question. However I am not the OP.
Hi. I've already tried the new version. Sorry for bringing this up again. It works sometimes but I found a case where it fails with a new error. My code is:
import lightkurve
import transitleastsquares as tls
import multiprocessing
lcf = lightkurve.search_lightcurve("TIC 352315023", mission="TESS", cadence="short", sector=[13, 27], author="SPOC")\
.download_all()
lc = lcf.stitch().remove_nans()
lc = lc.remove_outliers(sigma_lower=float('inf'), sigma_upper=3)
model = tls.transitleastsquares(lc.time.value, lc.flux.value)
results = model.power(period_min=0.45, period_max=40, use_threads=multiprocessing.cpu_count() - 1,
oversampling_factor=1.1119355997446583, T0_fit_margin=0.05, duration_grid_step=1.1)
And the error:
Transit Least Squares TLS 1.0.26 (20 March 2021)
Creating model cache for 45 durations
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/transit.py:157: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
lc_arr = numpy.array(lc_arr)
0%| | 0/15929 periods | 00:00<?Searching 33168 data points, 15929 periods from 0.602 to 39.985 days
Using all 8 CPU threads
100%|██████████| 15929/15929 periods | 13:26<00:00
Traceback (most recent call last):
File "/home/martin/git_repositories/sherlockpipe/test.py", line 10, in <module>
results = model.power(period_min=0.45, period_max=40, use_threads=multiprocessing.cpu_count(),
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py", line 263, in power
SR, power_raw, power, SDE_raw, SDE = spectra(chi2, self.oversampling_factor)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/stats.py", line 119, in spectra
my_median = running_median(power_raw, kernel)
File "/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/helpers.py", line 97, in running_median
med = numpy.median(data[idx], axis=1)
IndexError: index 15929 is out of bounds for axis 0 with size 15929
Process finished with exit code 1
Hi. Sorry for bothering again, just wanted to keep the issue alive so we can get a fix as soon as you find it possible. Kind regards.
Sorry you are having trouble. Looking into it right now.
The runtime of your script is ~1 minute on my Core-i7 7700k using 7 threads (and considerable background load). Your runtime shows as 13 minutes. Are you using a slow machine, or is there something else going on?
Why not use oversampling_factor=1
?
I have made a change to the code which fixes the issue, but raises some nasty warnings. Before I release a new version, can you test the new fix? You need to replace the file helpers.py
in your folder /home/martin/.local/lib/python3.8/site-packages/transitleastsquares/
with the version attached.
helpers.zip
Hi. Because oversampling_factor=1
is also still suboptimal and the period_grid is still denser than desired. The test I was doing was just using an automatically calculated oversampling factor. This value will probably be lower than 1 for other targets (or whenever we change the equation calculating the value) and therefore I would like to stick to float values instead of integers.
The runtime I reported was very long probably because I stopped the code with a breakpoint for a while and was debugging a little bit.
I will try your suggested file and come back with the output.
Hi, here are the results (it seems that the execution worked fine a.k.a. no errors):
Transit Least Squares TLS 1.0.26 (20 March 2021)
Creating model cache for 45 durations
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/transit.py:157: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
lc_arr = numpy.array(lc_arr)
0%| | 0/15929 periods | 00:00<?Searching 33168 data points, 15929 periods from 0.602 to 39.985 days
Using all 8 CPU threads
100%|██████████| 15929/15929 periods | 03:49<00:00
/home/martin/.local/lib/python3.8/site-packages/numpy/core/_methods.py:261: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/martin/.local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/home/martin/.local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/martin/.local/lib/python3.8/site-packages/transitleastsquares/main.py:404: UserWarning: 292 of 329 transits without data. The true period may be twice the given period.
warnings.warn(text)
Those ugly warnings were already appearing in a fresh installation that I was running from another project which also uses TLS. I don't know what is the cause of them but they probably need a fix too. Thanks for being working on this! And please keep me posted whenever you consider it is fixed.