Discussion:
[syslog-ng] syslog-ng 3.3.7 DNS resolving Problem
Daniel Neubacher
2013-01-02 12:40:55 UTC
Permalink
Hello there,
I've got a little trouble with the DNS resolving of syslog-ng. Last week I patched my syslog installation with the threaded dns bugfix (https://bugzilla.balabit.com/show_bug.cgi?id=212) and it seems like most of my problems are gone but one is still remaining.

Many times a day messages are sorted into a folder with the DNS name of my syslog-ng server instead of the real host where the log is coming from. The log line still has the right host in the text and most of the time it is working but I could not find any way to reproduce the problem on demand yet. For debugging I've disabled any logging for the server itself but it still happens.

My destinations are configured like this:
destination d_syslog { file("/log/syslog/${R_YEAR}/${R_MONTH}/${R_DAY}/$FULLHOST_FROM/$PROGRAM" template(t_plain)); };

And my dns options:
use_fqdn(yes);
dns_cache(yes);
dns_cache_size(16384);
dns_cache_expire(300);
dns_cache_expire_failed(10);

I've tried disabling the syslog-ng cache,installing a local caching bind and after that a nscd but with no success. With 750 servers sending 30k-40k logs per second the dns querys are too expensive and I need the internal syslog-ng caching. With local bind caching the logs per second are dropping down to 2500.

Does anybody has an idea to fix this?

--
Daniel Neubacher, Network Administrator
***@xing.com<mailto:***@xing.com>

XING AG
Gaensemarkt 43, 20354 Hamburg, Germany
Tel. +49 40 419131-28, Fax +49 40 419131-11

Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 98807
Exec. Board (Vorstand): Dr. Stefan Groß-Selbeck (Vorsitzender), Dr. Thomas Vollmoeller, Ingo Chu, Dr. Helmut Becker, Jens Pape
Chairman of the Supervisory Board (Aufsichtsratsvorsitzender): Dr. Neil Sunderland

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden and may be unlawful.
Gergely Nagy
2013-01-02 13:01:12 UTC
Permalink
Daniel Neubacher <***@xing.com> writes:

> Many times a day messages are sorted into a folder with the DNS name
> of my syslog-ng server instead of the real host where the log is
> coming from. The log line still has the right host in the text and
> most of the time it is working but I could not find any way to
> reproduce the problem on demand yet. For debugging I've disabled any
> logging for the server itself but it still happens.

This is not the first time I hear about this problem, but so far I have
not been able to reproduce it locally :(

Is it always the server address that gets used instead of the
originating host's name?

--
|8]
Daniel Neubacher
2013-01-02 13:25:43 UTC
Permalink
Yes but the the servers fqdn is used in my case.

What I know is that syslog-ng is ignoring the cache while it happens. In the same second where I can find a wrong log the server sorted another line from the same client into the right folder. One of my first guesses where failed dns requests but my caching time of 10 seconds for negative answers don't match the time of the log messages.

Guess I will debug some more if there are others which have this problem too. I thought I'm alone with this :)

-----Ursprüngliche Nachricht-----
Von: syslog-ng-***@lists.balabit.hu [mailto:syslog-ng-***@lists.balabit.hu] Im Auftrag von Gergely Nagy
Gesendet: Mittwoch, 2. Januar 2013 14:01
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem

Daniel Neubacher <***@xing.com> writes:

> Many times a day messages are sorted into a folder with the DNS name
> of my syslog-ng server instead of the real host where the log is
> coming from. The log line still has the right host in the text and
> most of the time it is working but I could not find any way to
> reproduce the problem on demand yet. For debugging I've disabled any
> logging for the server itself but it still happens.

This is not the first time I hear about this problem, but so far I have not been able to reproduce it locally :(

Is it always the server address that gets used instead of the originating host's name?

--
|8]

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Daniel Neubacher
2013-01-02 14:49:56 UTC
Permalink
To reproduce the problem I tried to generate a massive amount of logs with one client to a server with my live configuration but it didn't work. I guess the problem doesn't lie in the log amount but the hosts. And that's hard to test.

After that I did some more live testing. My first test was if this actually happens without dns resolving and it didn't. After that I've disabled threading and it seemed to work. My problem is that I need threading because syslog is now running on 100% :P
It was a quick test but after enabling threading again the problem appeared instantly. Now I've disabled it and test it for at least a day. But it seems like threading has one more problem :(


-----Ursprüngliche Nachricht-----
Von: syslog-ng-***@lists.balabit.hu [mailto:syslog-ng-***@lists.balabit.hu] Im Auftrag von Daniel Neubacher
Gesendet: Mittwoch, 2. Januar 2013 14:26
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem

Yes but the the servers fqdn is used in my case.

What I know is that syslog-ng is ignoring the cache while it happens. In the same second where I can find a wrong log the server sorted another line from the same client into the right folder. One of my first guesses where failed dns requests but my caching time of 10 seconds for negative answers don't match the time of the log messages.

Guess I will debug some more if there are others which have this problem too. I thought I'm alone with this :)

-----Ursprüngliche Nachricht-----
Von: syslog-ng-***@lists.balabit.hu [mailto:syslog-ng-***@lists.balabit.hu] Im Auftrag von Gergely Nagy
Gesendet: Mittwoch, 2. Januar 2013 14:01
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem

Daniel Neubacher <***@xing.com> writes:

> Many times a day messages are sorted into a folder with the DNS name
> of my syslog-ng server instead of the real host where the log is
> coming from. The log line still has the right host in the text and
> most of the time it is working but I could not find any way to
> reproduce the problem on demand yet. For debugging I've disabled any
> logging for the server itself but it still happens.

This is not the first time I hear about this problem, but so far I have not been able to reproduce it locally :(

Is it always the server address that gets used instead of the originating host's name?

--
|8]

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Gergely Nagy
2013-01-02 14:58:25 UTC
Permalink
Daniel Neubacher <***@xing.com> writes:

> After that I did some more live testing. My first test was if this
> actually happens without dns resolving and it didn't. After that I've
> disabled threading and it seemed to work. My problem is that I need
> threading because syslog is now running on 100% :P

This narrows it down a little, thanks!

--
|8]
Daniel Neubacher
2013-01-07 12:35:10 UTC
Permalink
I've got no false sorted message since disableling threading. Do you have any idea what I could try else? The syslog service is at 100% all the time and tweaking options like flush_lines and flush_timeout made my server only slower.

-----Ursprüngliche Nachricht-----
Von: syslog-ng-***@lists.balabit.hu [mailto:syslog-ng-***@lists.balabit.hu] Im Auftrag von Gergely Nagy
Gesendet: Mittwoch, 2. Januar 2013 15:58
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem

Daniel Neubacher <***@xing.com> writes:

> After that I did some more live testing. My first test was if this
> actually happens without dns resolving and it didn't. After that I've
> disabled threading and it seemed to work. My problem is that I need
> threading because syslog is now running on 100% :P

This narrows it down a little, thanks!

--
|8]

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Gergely Nagy
2013-01-07 16:06:23 UTC
Permalink
Daniel Neubacher <***@xing.com> writes:

> I've got no false sorted message since disableling threading. Do you
> have any idea what I could try else? The syslog service is at 100% all
> the time and tweaking options like flush_lines and flush_timeout made
> my server only slower.

I have no further ideas yet, I've been busy with other things in the
last couple of days. This issue is the highest on my TODO list now,
though. But just by looking at the code, I couldn't find the error, so
I'm working on reproducing it locally.

--
|8]
Daniel Neubacher
2013-01-08 07:46:44 UTC
Permalink
I've got none of these errors with syslog-ng 3.4 beta1 - does this make sense?
________________________________________
Von: syslog-ng-***@lists.balabit.hu [syslog-ng-***@lists.balabit.hu]&quot; im Auftrag von &quot;Gergely Nagy [***@balabit.hu]
Gesendet: Montag, 7. Januar 2013 17:06
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem

Daniel Neubacher <***@xing.com> writes:

> I've got no false sorted message since disableling threading. Do you
> have any idea what I could try else? The syslog service is at 100% all
> the time and tweaking options like flush_lines and flush_timeout made
> my server only slower.

I have no further ideas yet, I've been busy with other things in the
last couple of days. This issue is the highest on my TODO list now,
though. But just by looking at the code, I couldn't find the error, so
I'm working on reproducing it locally.

--
|8]

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Gergely Nagy
2013-01-08 08:59:09 UTC
Permalink
Daniel Neubacher <***@xing.com> writes:

> I've got none of these errors with syslog-ng 3.4 beta1 - does this
> make sense?

Interesting... all DNS-related code should be the same between 3.4beta1
and the latest 3.3 git master (compared to 3.3.7, both have a fix that
uses thread-safe lookups).

Compared to 3.3.7, 3.4 beta1 has only one patch that is relevant:

commit 11b20b28f7586b2bf10c281328f28d93f39e279c
Author: Balazs Scheidler <***@balabit.hu>
Date: Fri Dec 14 17:54:39 2012 +0100

resolve_sockaddr: fixed unsafe use of non-reentrant APIs to resolve IP addresses to names

As it seems the use of the DNS cache hid the fact that we're not thread
safe when resolving IPs to DNS names. This patch attempts to use
getnameinfo() API if available that is thread safe and protects
all other paths with a mutex.

Reported-By: Brian Kroth <***@gmail.com>
Tested-By: Gergely Nagy <***@balabit.hu>
Signed-off-by: Balazs Scheidler <***@balabit.hu>

https://github.com/balabit/syslog-ng-3.3/commit/11b20b28f7586b2bf10c281328f28d93f39e279c.patch

Come to think of it, the lack of this patch might very well be the cause
of your issue. Can you check if the latest 3.3 git master works for you?

There's an easily buildable tarball available at:
http://packages.madhouse-project.org/syslog-ng/3.3/3.3.7/syslog-ng-3.3.7-20130105-v3.3.7-14-g45eaa.tar.gz

If this does fix the problem, then my apologies, I should've thought of
it way sooner. :|

--
|8]
Daniel Neubacher
2013-01-09 09:10:48 UTC
Permalink
I've got no errors yet. Sometimes syslog fooled me and waited to do it for a few days but I hope that's not the case this time. Thanks for your help.

-----Ursprüngliche Nachricht-----
Von: syslog-ng-***@lists.balabit.hu [mailto:syslog-ng-***@lists.balabit.hu] Im Auftrag von Gergely Nagy
Gesendet: Dienstag, 8. Januar 2013 09:59
An: Syslog-ng users' and developers' mailing list
Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem

Daniel Neubacher <***@xing.com> writes:

> I've got none of these errors with syslog-ng 3.4 beta1 - does this
> make sense?

Interesting... all DNS-related code should be the same between 3.4beta1 and the latest 3.3 git master (compared to 3.3.7, both have a fix that uses thread-safe lookups).

Compared to 3.3.7, 3.4 beta1 has only one patch that is relevant:

commit 11b20b28f7586b2bf10c281328f28d93f39e279c
Author: Balazs Scheidler <***@balabit.hu>
Date: Fri Dec 14 17:54:39 2012 +0100

resolve_sockaddr: fixed unsafe use of non-reentrant APIs to resolve IP addresses to names

As it seems the use of the DNS cache hid the fact that we're not thread
safe when resolving IPs to DNS names. This patch attempts to use
getnameinfo() API if available that is thread safe and protects
all other paths with a mutex.

Reported-By: Brian Kroth <***@gmail.com>
Tested-By: Gergely Nagy <***@balabit.hu>
Signed-off-by: Balazs Scheidler <***@balabit.hu>

https://github.com/balabit/syslog-ng-3.3/commit/11b20b28f7586b2bf10c281328f28d93f39e279c.patch

Come to think of it, the lack of this patch might very well be the cause of your issue. Can you check if the latest 3.3 git master works for you?

There's an easily buildable tarball available at:
http://packages.madhouse-project.org/syslog-ng/3.3/3.3.7/syslog-ng-3.3.7-20130105-v3.3.7-14-g45eaa.tar.gz

If this does fix the problem, then my apologies, I should've thought of it way sooner. :|

--
|8]

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Loading...