tlog icon indicating copy to clipboard operation
tlog copied to clipboard

Tlog is recording rsync data

Open vincentwolsink opened this issue 3 years ago • 10 comments

When using tlog to record sessions, it will actually capture rsync data. This will seriously grow the log files, since it is effectively copying all the files being transferred to the logs.

tlog     23556 23555 86 08:02 ?        00:03:46 tlog-rec-session -c rsync --server --sender -logDtpre.iLsfxC . /data/analytic_events/authorization/logdate...
-rw------- 1 root root 642G Apr  1 19:20 /var/log/tlog.log

Using a non-recorded user for rsyncing is not really a feasible solution/workaround because of file ownership and permissions.

Is there any way in which tlog can detect this is not a shell session? Can we exclude sessions from being logged when a certain command is invoked when starting the session, in this case rsync --server for example?

vincentwolsink avatar Apr 02 '21 10:04 vincentwolsink

Tlog intercepts, and forwards streams of I/O data across a pseudoterminal from an outsiders point of view. Handling the I/O is done in an abstracted way, by design. For this reason there is no functionality to have tlog to "filter" out certain commands. There could be some filtering allowed in tlog-play when reading back the recorded messages, but that does not help in this case.

You might check into rate limiting in man tlog-rec-session.conf(5) the section

   limit - Logging limit object

justin-stephenson avatar Apr 12 '21 12:04 justin-stephenson

Thanks for your answer. I understand. Rate limiting does make things a bit better, but still a lot of data will end up in the logs. The thing I am suggesting is not "filtering" certain commands, but not logging the specific session at all.

In the example the -c flag will make the user shell run rsync and exit afterwards. So the entire session will be rsync data and can be ignored. Since this -c rsync —server ... argument is passed to tlog as a regular argument it should be easy to use this and decide wether to log or not I assume?

Of course the invocation arguments to exclude logging on should be chosen very carefully by the system administrator.

I might be able to craft some PR for this myself. But just discussing here to see if it makes any sense.

vincentwolsink avatar Apr 12 '21 13:04 vincentwolsink

Thanks for your answer. I understand. Rate limiting does make things a bit better, but still a lot of data will end up in the logs. The thing I am suggesting is not "filtering" certain commands, but not logging the specific session at all.

In the example the -c flag will make the user shell run rsync and exit afterwards. So the entire session will be rsync data and can be ignored. Since this -c rsync —server ... argument is passed to tlog as a regular argument it should be easy to use this and decide wether to log or not I assume?

Of course the invocation arguments to exclude logging on should be chosen very carefully by the system administrator.

I might be able to craft some PR for this myself. But just discussing here to see if it makes any sense.

The most common case for tlog-rec-session is to be used as the user's shell, this will start the user's shell underneath tlog-rec-session in an interactive login session in which tlog has no way isolate file descriptor I/O by command or process(Only input, output, and window size changes). Sorry I don't think it is something that we can resolve on the tlog side.

justin-stephenson avatar Apr 12 '21 19:04 justin-stephenson

I agree that what you describe is the most common use case for tlog. But if you enable session recording in SSSD, tlog will be in between every session. Not only interactive ones. And since things like rsyncing or scping (which suffers from exactly the same issue) files are also a very common use case of a linux system, in my opinion, they cannot be ignored.

I am not talking about interactive sessions where you need to isolate file descriptors or filter a stream, I understand that is very difficult. The issue is with non-interactive sessions where either rsync or scp is spawned directly without any tty. And the only purpose of the input/output is to transfer file data. These sessions should not be logged entirely.

Another example:

root     30256  0.0  0.0 180384  5640 ?        Ss   09:35   0:00  \_ sshd: user2481 [priv]
user2481 30276 11.9  0.0 180524  2568 ?        S    09:35   0:01      \_ sshd: user2481@notty
tlog     30277  3.4  0.0 227124  3812 ?        Ss   09:35   0:00          \_ tlog-rec-session -c scp -t /tmp/
user2481 30278  3.8  0.0 186676  2848 ?        S    09:35   0:00              \_ scp -t /tmp/

vincentwolsink avatar Apr 13 '21 07:04 vincentwolsink

Sorry for the delayed response. If you have some solution to make a configurable list of commands (read at startup from the tlog config file) which can be ignored by tlog-rec-session in the '-c' invocation case, then feel free to submit a PR for that.

justin-stephenson avatar May 11 '21 18:05 justin-stephenson

This is quite an important limitation. I do not yet know anything about the architecture since I hit this issue while trying this system for the first time but could it possibly be resolved at some other layer, for example sssd?

bluikko avatar Apr 09 '22 05:04 bluikko

I'm also quite interested in a solution for this, as disabling output logging is the only reasonable solution, and that sorta.. defeats the purpose of tlog. In my case this delta is quite severe. ~4MB/s with output logging on, 230MB/s with logging off.

logging on:

file
     81,100,800   0%    4.82MB/s    0:34:29

no logging:

file
    715,587,584   6%  227.63MB/s    0:00:41

NeilHanlon avatar Dec 13 '23 20:12 NeilHanlon

I'm also quite interested in a solution for this, as disabling output logging is the only reasonable solution, and that sorta.. defeats the purpose of tlog. In my case this delta is quite severe. ~4MB/s with output logging on, 230MB/s with logging off.

logging on:

file
     81,100,800   0%    4.82MB/s    0:34:29

no logging:

file
    715,587,584   6%  227.63MB/s    0:00:41

Does that work for both directions of an rsync? Because you can pull and push files, maybe that uses either input or output for those scenarios?

kees-closed avatar Dec 15 '23 08:12 kees-closed

Does that work for both directions of an rsync? Because you can pull and push files, maybe that uses either input or output for those scenarios?

Good question. It does not seem to matter if I am pushing or pulling, as long as the target machine has tlog disabled.

NeilHanlon avatar Jan 11 '24 00:01 NeilHanlon

Does that work for both directions of an rsync? Because you can pull and push files, maybe that uses either input or output for those scenarios?

Good question. It does not seem to matter if I am pushing or pulling, as long as the target machine has tlog disabled.

A different trade off is that by enabling input and not output anymore, you log passwords as well.

https://github.com/Scribery/tlog/issues/77#issuecomment-1225503372

So maybe the sweet spot is to only log the terminal size and output and apply congestion control. Of course then the trade off is that you can saturate the session and type stuff without it being logged. But with my experiments I conclude there is no setup possible with tlog that solves all security risks and performance issues.

kees-closed avatar Feb 05 '24 15:02 kees-closed