Solved: Admin v2 log API confusion

pgf · ‎09-01-2021

I am trying to read logs from the /admin/v2/logs/authentication and running into some difficulty which does not seem to match the documentation. The doc says that the next_offset will be empty when there are no more pages of data to read, but this is not what I see in my data. I have picked a time range with only three log records in it and set the limit to 1000. This should result in a single response from the API with three log records and no next_offset. What I actually get is three log records AND a valid next_offset. Here is the metadata I got back:

"metadata": {"next_offset": ["1630504390358", "f81cb3ad-c03d-48e5-a65a-528568291b76"], "total_objects": 3}

The total_objects value is correct, but why is there a valid next_offset? It is impossible for my code to figure out whether there is more data to retrieve.

DuoKristina · ‎09-07-2021

If you then use that last next offset you will get the response without a value for next_offset and know you are done, like…

    "response": {
        "authlogs": [],
        "metadata": {
            "next_offset": null,
            "total_objects": 0
        }
    },
    "stat": "OK"
}```

Duo, not DUO.

View solution in original post

DuoKristina · ‎09-07-2021

If you then use that last next offset you will get the response without a value for next_offset and know you are done, like…

    "response": {
        "authlogs": [],
        "metadata": {
            "next_offset": null,
            "total_objects": 0
        }
    },
    "stat": "OK"
}```

Duo, not DUO.

pgf · ‎09-08-2021

I have let it run for several iterations now and I keep getting the same body and the same metadata back every call. Here is the sequence:

1^st query:
- Params: limit=1000&maxtime=1631101861671&mintime=1631100961671
- Metadata: {“next_offset”: [“1631100999267”, “64d83e91-3f7f-4a1d-bcd0-aace228cbc2e”], “total_objects”: 11}}
- Stat: OK
Sleep 60 seconds
2^nd query:
- Params: limit=1000&maxtime=1631101861671&mintime=1631100961671&next_offset=1631100999267%2C64d83e91-3f7f-4a1d-bcd0-aace228cbc2e
- Metadata: {“next_offset”: [“1631100999267”, “64d83e91-3f7f-4a1d-bcd0-aace228cbc2e”], “total_objects”: 11}}
- Stat: OK
Sleep 60 seconds
3^rd query:
- Params: limit=1000&maxtime=1631101861671&mintime=1631100961671&next_offset=1631100999267%2C64d83e91-3f7f-4a1d-bcd0-aace228cbc2e
- Metadata: {“next_offset”: [“1631100999267”, “64d83e91-3f7f-4a1d-bcd0-aace228cbc2e”], “total_objects”: 11}}
- Stat: OK

All subsequent loops are exactly the same as the last two. The response from the API is always the same body and metadata.

DuoKristina · ‎09-08-2021

Hmm, in my limited test of trying this once I receive the null metadata info when expected…

I send https://api-whatevermyduohostis/admin/v2/logs/authentication?limit=1000&maxtime=1631101861671&mintime=1630930956000 and I get the response…

{
    "response": {
        "authlogs": [
            {
                "access_device": {
                    "browser": "Chrome",
                    "browser_version": "92.0.4515.159",
                    "blah_blah": "the rest of the authlog events",
               }
            }
        ],
        "metadata": {
            "next_offset": [
                "1631028589586",
                "58370083-f90a-46aa-ac79-65c14b778b25"
            ],
            "total_objects": 2
        }
    },
    "stat": "OK"
}

Then my next request, which is the same as the first one except that it adds in the offset info from the metadata is https://api-whatevermyduohostis/admin/v2/logs/authentication?limit=1000&maxtime=1631101861671&mintime=1630930956000&next_offset=1631028589586,58370083-f90a-46aa-ac79-65c14b778b25 and I get back…

{
    "response": {
        "authlogs": [],
        "metadata": {
            "next_offset": null,
            "total_objects": 2
        }
    },
    "stat": "OK"
}

So I am not sure why your second query returns the same offset that you sent in the query. Do the next_offset values match the timestamp and txid values of the oldest event returned by the query?

Duo, not DUO.

pgf · ‎09-08-2021

In each case the last entry is the same. Here is the last part of the response packet:

“timestamp”: 1631100999, “txid”: “64d83e91-3f7f-4a1d-bcd0-aace228cbc2e”, “user”: {“groups”: [“GL-SG_DUO_Remote_Access_Users (from AD sync “DOMAIN”)”], “key”: “■■■■■■■■■■■■■■■■■■■■”, “name”: “USERNAME”}}], “metadata”: {“next_offset”: [“1631100999267”, “64d83e91-3f7f-4a1d-bcd0-aace228cbc2e”], “total_objects”: 11}}, “stat”: “OK”}

It looks to me like the timestamp and txid match, barring the milliseconds added to the timestamp in the metadata.

pgf · ‎09-08-2021

My apologies – it was a programme bug. I just found it thanks to your example data. You can close this case.

DuoKristina · ‎09-08-2021

This isn’t a case, it’s a conversation. I am not support, just someone who works at Duo and likes answering questions. If you need actual technical support in the future, be sure to contact Duo Support to open a real case.

I’m glad you figured it out!

Duo, not DUO.

pgf · ‎09-08-2021

Ah! That explains a few things. I opened this question on the community site, then opened a Support case. I see now that nobody ever responded to the Support case (rather worrying since our company relies on Duo for MFA and we pay for support) but in the mean time you responded to the community question. Sorry for confusing the two and thanks for the help.

DuoKristina · ‎09-08-2021

Oh no! I’m sorry to hear that your Support case wasn’t addressed in our usual timely fashion. I’ll see if I can find it and link to this post so the engineer who picks it up is aware of the resolution.

Duo, not DUO.

pgf · ‎09-09-2021

I have the first part of this working now but I am stuck again on the next piece. The plan is to have this program run on a schedule - let’s say hourly. Each time it runs it saves the final state. The next run should read in the state and pick up where the last run left off. Unfortunately there is not enough information in the doc to figure out how to do this.

At the end of a run the program saves the last_offset values (timestamp and txid) from the final response. I have tried to use this as the starting point for the next run. If I pass the whole next_offset in the params I get nothing back at all. I have also tried extracting the timestamp from the last_offset and using that as the mintime value of a new query, but this often gives me duplicate log records. How did the developers intend this to be done?

DuoKristina · ‎09-09-2021

We have published a utility to pull logs. Did you explore using that? You may be able to get some ideas from it for your own script too, like, I think it uses a check point file to resume pulling logs from a certain offset.

Duo, not DUO.

pgf · ‎09-10-2021

Good point - I will take another look at the code there. I would rather have used that utility but it is unfortunately not suitable. It sends the data to a listening port, which means that the data would be lost if the listener was down. My code sends the data to a file to be picked up asynchronously. If either end is down the data is not lost.

pgf · ‎09-10-2021

I am not a Python guru, but looking at the duo_log_sync module I can see a comment in authlog_producer.py: “Mintime is never used when next offset is present”. If this is true then it would seem a very good piece of information to put into the documentation. It appears that the code sets next_offset from the saved value and max_time to the current timestamp in milliseconds. Hard to be sure though as it is (IMHO) unnecessarily complex, but then that describes most Python code I think.

DuoKristina · ‎09-10-2021

Thanks for spotting that tidbit and raising it here. I’ll follow up with the team that owns logging to confirm.

Duo, not DUO.